gagneurlab / drop

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
MIT License
137 stars 45 forks source link

Error in rule AberrantSplicing_pipeline_Counting_00_define_datasets_from_anno_R #492

Closed frankaugs closed 12 months ago

frankaugs commented 1 year ago

Dear DROP team, Thank you for developing DROP. Recently, I raise an error while running snakemake aberrantSplicing with my own and external RNA-seq data:

rule AberrantSplicing_pipeline_Counting_00_define_datasets_from_anno_R: input: /home/ngs/rnaseq/03align_out_5/drop_1/sampleAnnotation_s.tsv, Scripts/AberrantSplicing/pipeline/Counting/00_define_datasets_from_anno.R output: /home/ngs/rnaseq/03align_out_5/drop_1/project3/Output/processed_data/aberrant_splicing/annotations/fraser.tsv, /home/ngs/rnaseq/03align_out_5/drop_1/project3/htmlOutput/AberrantSplicing/annotations/fraser.html log: /home/ngs/rnaseq/03align_out_5/drop_1/.drop/tmp/AS/fraser/00_defineDataset.Rds jobid: 11 reason: Missing output files: /home/ngs/rnaseq/03align_out_5/drop_1/project3/Output/processed_data/aberrant_splicing/annotations/fraser.tsv wildcards: dataset=fraser resources: tmpdir=/tmp

Config projectTitle: "DROP: RNAseq" root: /home/ngs/rnaseq/03align_out_5/drop_1/project3/Output # root directory of all output objects and tables htmlOutputPath: /home/ngs/rnaseq/03align_out_5/drop_1/project3/htmlOutput # path for HTML rendered reports indexWithFolderName: true # whether the root base name should be part of the index name

hpoFile: null # if null, downloads it from webserver sampleAnnotation: /home/ngs/rnaseq/03align_out_5/drop_1/sampleAnnotation_s.tsv # path to sample annotation (see documentation on how to create it)

geneAnnotation: v29: /home/ngs/rnaseq/03align_out_5/drop_1/gencode.v29.gtf genomeAssembly: hg19 genome: /home/ngs/rnaseq/03align_out_5/drop_1/hg19_ucsc.fa # path to reference genome sequence in fasta format.

You can define multiple reference genomes in yaml format, ncbi: path/to/ncbi, ucsc: path/to/ucsc

# the keywords that define the path should be in the GENOME column of the sample annotation table

random_seed: false # just for demo data, remove for analysis

exportCounts:

specify which gene annotations to include and which

# groups to exclude when exporting counts
geneAnnotations:
    - v29
excludeGroups:
    - null

aberrantExpression: run: fasle groups:

aberrantSplicing: run: true groups:

mae: run: false groups:

rnaVariantCalling: run: false groups:

tools: gatkCmd: gatk bcftoolsCmd: bcftools samtoolsCmd: samtools

Command: snakemake aberrantSplicing --cores 10

(94GB memory) I wonder if you could provide a solution to this problem? Thank you! Best regards, Frank 2023-09-12.snakemake.log sampleAnnotation_s.csv

vyepez88 commented 1 year ago

Hi Frank, Thanks for using DROP and reporting this.

frankaugs commented 1 year ago

Hi Vicente,

Thank you so much for your response.

1.The drop version is 1.3.3. 2.Sure, I have crorected the typo, by the way, if the typo appears here, the drop will not run the module, right? 3.Thank you for your suggestion, I have checked the sample annotation file.

  1. I tried to run the demo at the beginning, but it didn't work. When I input drop demo in the terminal, only six files were produced (config.yaml; readme.md;Scripts;Snakefile;.drop;.wBuild). I guess there's something wrong with the environment configuration.
  2. Yes, all the files look good.

Here are the command responses:

$ mamba create -n drop3 -c conda-forge -c bioconda drop --override-channels ... ... ... ... Downloading and Extracting Packages

Preparing transaction: done Verifying transaction: \ SafetyError: The package for r-base located at /home/ngs/anaconda3/pkgs/r-base-4.3.1-h29c4799_3 appears to be corrupted. The path 'lib/R/doc/html/packages.html' has an incorrect size. reported size: 3423 bytes actual size: 54045 bytes

done Executing transaction: \ | done

To activate this environment, use

 $ mamba activate drop3

To deactivate an active environment, use

 $ mamba deactivate

$drop demo create /home/ngs/rnaseq/03align_out_6/drop2/Scripts create /home/ngs/rnaseq/03align_out_6/drop2/.drop create /home/ngs/rnaseq/03align_out_6/drop2/.drop/tmp /home/ngs/rnaseq/03align_out_6/drop2/Scripts/AberrantExpression/pipeline is not a directory, copy over from drop base /home/ngs/rnaseq/03align_out_6/drop2/Scripts/AberrantSplicing/pipeline is not a directory, copy over from drop base /home/ngs/rnaseq/03align_out_6/drop2/Scripts/MonoallelicExpression/pipeline is not a directory, copy over from drop base /home/ngs/rnaseq/03align_out_6/drop2/Scripts/rnaVariantCalling/pipeline is not a directory, copy over from drop base init...done download data File ‘/tmp/main.zip’ already there; not retrieving.

Archive: /tmp/main.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of /tmp/main.zip or /tmp/main.zip.zip, and cannot find /tmp/main.zip.ZIP, period. Traceback (most recent call last): File "/home/ngs/anaconda3/envs/rna/bin/drop", line 10, in sys.exit(main()) File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/drop/cli.py", line 174, in demo response.check_returncode() File "/home/ngs/anaconda3/envs/rna/lib/python3.10/subprocess.py", line 457, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout, subprocess.CalledProcessError: Command '['bash', '/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/drop/download_data.sh']' returned non-zero exit status 9.

And then, I input command snakemake --cores 1 -n. As expected, this program does not work properly. So I tried to run the following modules directly. Surprisingly, both the subsequent Expression and Splicing could proceed normally, respectively. I wonder if you could tell me that whether this problem will affect the Expression or Splicing module in some way?

On the original question:

After my testing, I found why the problem appears. I deleted some contents in the Splicing module of the config file by mistake when I tried to run the Expression module. Now, it worked!

The mistake deletion content:

FRASER1 configuration

FRASER_version: "FRASER" 
deltaPsiCutoff : 0.3 
quantileForFiltering: 0.95 
### For FRASER2, use the follwing parameters instead of the 3 lines above:
# FRASER_version: "FRASER2"
# deltaPsiCutoff : 0.1
# quantileForFiltering: 0.75

Many thanks!

Frank

vyepez88 commented 12 months ago

Great that it worked!