Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

Pipeline Error #1

Closed davidepisu closed 6 years ago

davidepisu commented 7 years ago

Got this error while generating the Expression Matrix:

[Sun Sep 10 01:02:04 2017] Finished job 1. [Sun Sep 10 01:02:04 2017] 4 of 5 steps (80%) done [Sun Sep 10 01:02:04 2017] [Sun Sep 10 01:02:04 2017] localrule all: input: logs/MLW12_hist_out_cell.txt log: logs/Dropseq_post_align.log jobid: 0 [Sun Sep 10 01:02:04 2017] [Sun Sep 10 01:02:04 2017] Finished job 0. [Sun Sep 10 01:02:04 2017] 5 of 5 steps (100%) done Mode is generate-plots Generating multiqc report [INFO ] multiqc : This is MultiQC v1.2 [INFO ] multiqc : Template : default [INFO ] multiqc : Searching '/SSD/MLW12/logs' [INFO ] multiqc : Searching '/SSD/MLW12/summary' Searching 62 files.. [####################################] 100%
[INFO ] star : Found 2 reports [INFO ] fastqc : Found 2 reports [INFO ] multiqc : Compressing plot data [INFO ] multiqc : Report : MLW12/multiqc_report.html [INFO ] multiqc : Data : MLW12/multiqc_data [INFO ] multiqc : MultiQC complete Extracting expression [Sun Sep 10 01:02:43 2017] Provided cores: 20 [Sun Sep 10 01:02:43 2017] Rules claiming more threads will be scaled down. [Sun Sep 10 01:02:43 2017] Job counts: count jobs 1 all 1 extract_expression 1 extract_umi_per_gene 1 gunzip 4 [Sun Sep 10 01:02:43 2017] [Sun Sep 10 01:02:43 2017] rule extract_umi_per_gene: input: MLW12_final.bam output: logs/MLW12_umi_per_gene.tsv jobid: 1 wildcards: sample=MLW12 [Sun Sep 10 01:02:43 2017] [Sun Sep 10 01:02:43 2017] /programs/Drop-seq_tools-1.12/GatherMolecularBarcodeDistributionByGene I=MLW12_final.bam O=logs/MLW12_umi_per_gene.tsv CELL_BC_FILE=summary/MLW12_barcodes.csv [Sun Sep 10 01:02:43 2017] rule extract_expression: input: MLW12_final.bam output: summary/MLW12_expression_matrix.txt.gz jobid: 3 wildcards: sample=MLW12 [Sun Sep 10 01:02:43 2017] [Sun Sep 10 01:02:43 2017] /programs/Drop-seq_tools-1.12/DigitalExpression I=MLW12_final.bam O=summary/MLW12_expression_matrix.txt.gz SUMMARY=summary/MLW12_dge.summary.txt CELL_BC_FILE=summary/MLW12_barcodes.csv MIN_BC_READ_THRESHOLD=1 [Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.DigitalExpression SUMMARY=summary/MLW12_dge.summary.txt OUTPUT=summary/MLW12_expression_matrix.txt.gz INPUT=MLW12_final.bam MIN_BC_READ_THRESHOLD=1 CELL_BC_FILE=summary/MLW12_barcodes.csv OUTPUT_READS_INSTEAD=false CELL_BARCODE_TAG=XC MOLECULAR_BARCODE_TAG=XM GENE_EXON_TAG=GE STRAND_TAG=GS EDIT_DISTANCE=1 READ_MQ=10 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.GatherMolecularBarcodeDistributionByGene OUTPUT=logs/MLW12_umi_per_gene.tsv INPUT=MLW12_final.bam CELL_BC_FILE=summary/MLW12_barcodes.csv CELL_BARCODE_TAG=XC MOLECULAR_BARCODE_TAG=XM GENE_EXON_TAG=GE STRAND_TAG=GS EDIT_DISTANCE=1 READ_MQ=10 MIN_BC_READ_THRESHOLD=0 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Sun Sep 10 01:02:44 EDT 2017] Executing as sb929@cbsumm07.tc.cornell.edu on Linux 3.10.0-229.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 1.12(d3aeea7_1452606774) IntelDeflater [Sun Sep 10 01:02:44 EDT 2017] Executing as sb929@cbsumm07.tc.cornell.edu on Linux 3.10.0-229.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 1.12(d3aeea7_1452606774) IntelDeflater [Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.DigitalExpression done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2022178816 Exception in thread "main" [Sun Sep 10 01:02:44 EDT 2017] org.broadinstitute.dropseqrna.barnyard.GatherMolecularBarcodeDistributionByGene done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2022178816 Exception in thread "main" htsjdk.samtools.SAMException: Error opening file: MLW12_barcodes.csvhtsjdk.samtools.SAMException: Error opening file: MLW12_barcodes.csv

at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:501)  at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:501)

at picard.util.BasicInputParser.filesToInputStreams(BasicInputParser.java:172)  at picard.util.BasicInputParser.filesToInputStreams(BasicInputParser.java:172)

at picard.util.BasicInputParser.<init>(BasicInputParser.java:78)    at picard.util.BasicInputParser.<init>(BasicInputParser.java:78)

at picard.util.BasicInputParser.<init>(BasicInputParser.java:91)    at picard.util.BasicInputParser.<init>(BasicInputParser.java:91)

at org.broadinstitute.dropseqrna.barnyard.ParseBarcodeFile.readCellBarcodeFile(ParseBarcodeFile.java:13)    at org.broadinstitute.dropseqrna.barnyard.ParseBarcodeFile.readCellBarcodeFile(ParseBarcodeFile.java:13)

at org.broadinstitute.dropseqrna.barnyard.BarcodeListRetrieval.getCellBarcodes(BarcodeListRetrieval.java:47)    at org.broadinstitute.dropseqrna.barnyard.BarcodeListRetrieval.getCellBarcodes(BarcodeListRetrieval.java:47)

at org.broadinstitute.dropseqrna.barnyard.GatherMolecularBarcodeDistributionByGene.doWork(GatherMolecularBarcodeDistributionByGene.java:55) at org.broadinstitute.dropseqrna.barnyard.DigitalExpression.doWork(DigitalExpression.java:74)

at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)  at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)

at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)

at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:29)  at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:29)

Caused by: java.io.FileNotFoundException: summary/MLW12_barcodes.csv (No such file or directory)Caused by: java.io.FileNotFoundException: summary/MLW12_barcodes.csv (No such file or directory)

at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)   at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:497)
at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:497)
... 9 more
... 9 more

[Sun Sep 10 01:02:44 2017] Error in job extract_expression while creating output file summary/MLW12_expression_matrix.txt.gz. [Sun Sep 10 01:02:44 2017] Error in job extract_umi_per_gene while creating output file logs/MLW12_umi_per_gene.tsv. [Sun Sep 10 01:02:44 2017] RuleException: CalledProcessError in line 21 of /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake: Command '/programs/Drop-seq_tools-1.12/DigitalExpression I=MLW12_final.bam O=summary/MLW12_expression_matrix.txt.gz SUMMARY=summary/MLW12_dge.summary.txt CELL_BC_FILE=summary/MLW12_barcodes.csv MIN_BC_READ_THRESHOLD=1' returned non-zero exit status 1. File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake", line 21, in rule_extract_expression File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 55, in run [Sun Sep 10 01:02:44 2017] RuleException: CalledProcessError in line 34 of /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake: Command '/programs/Drop-seq_tools-1.12/GatherMolecularBarcodeDistributionByGene I=MLW12_final.bam O=logs/MLW12_umi_per_gene.tsv CELL_BC_FILE=summary/MLW12_barcodes.csv' returned non-zero exit status 1. File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake", line 34, in rule_extract_umi_per_gene File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 55, in run [Sun Sep 10 01:02:44 2017] Removing output files of failed job extract_umi_per_gene since they might be corrupted: logs/MLW12_umi_per_gene.tsv [Sun Sep 10 01:02:44 2017] Will exit after finishing currently running jobs. [Sun Sep 10 01:02:44 2017] Exiting because a job execution failed. Look above for error message Traceback (most recent call last): File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')() File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 223, in main shell(extract_expression_single) File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'snakemake -s /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/extract_expression_single.snake --cores 20 -pT -d /SSD/MLW12 --configfile /SSD/local.yaml ' returned non-zero exit status 1.

Hoohm commented 7 years ago

Normally, the generate-plot should create a file in the summary file. So there is something wrong there Something is really odd, you don't get any errors while running the generate-plots mode?

Hoohm commented 7 years ago

Can you provide the config.yaml file? I'm thinking maybe your datatype value is wrong. Is it SingleCell instead of singleCell?

davidepisu commented 7 years ago

I can't attach the file here. I copied the settings from here: https://github.com/Hoohm/dropSeqPipe/wiki/Create-config-files

Anyway this is my config file:

Samples: MLW15: fraction: 0.001 expected_cells: 2000 GENOMEREF: /SSD/ref/genome.fa REFFLAT: /SSD/ref/annotation.refFlat RRNAINTERVALS: /SSD/ref/genome.rRNA.intervals METAREF: /SSD/ref/STAR_INDEX_NO_GTF/ GTF: /SSD/ref/annotation.gtf SPECIES:

Hoohm commented 7 years ago

Ok, so datatype has to be either bulk or singleCell. And it is case sensitive. I will put some checks in.

davidepisu commented 7 years ago

Ok, I can try running the pipeline on another sample, setting singleCell instead of SingleCell in the config file.

Hoohm commented 7 years ago

Added a check for Data_type value. Please let me know if that fixed the issue.

davidepisu commented 6 years ago

Still getting the error at the fastqc...

/programs/FastQC-0.11.5/ MLW4_R1.fastq.gz MLW4_R2.fastq.gz -t 2 -o logs --extract /bin/bash: /programs/FastQC-0.11.5/: Is a directory [Sat Oct 14 23:39:40 2017] Error in job fastqc while creating output file logs/MLW4_R1_fastqc.html. [Sat Oct 14 23:39:40 2017] RuleException: CalledProcessError in line 23 of /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/fastqc.snake: Command '/programs/FastQC-0.11.5/ MLW4_R1.fastq.gz MLW4_R2.fastq.gz -t 2 -o logs --extract' returned non-zero exit status 126. File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/fastqc.snake", line 23, in rule_fastqc File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 55, in run [Sat Oct 14 23:39:40 2017] Will exit after finishing currently running jobs. [Sat Oct 14 23:39:40 2017] Exiting because a job execution failed. Look above for error message Traceback (most recent call last): File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')() File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 113, in main shell(fastqc) File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new__ raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'snakemake -s /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Snakefiles/singleCell/fastqc.snake --cores 60 -pT -d /SSD/MLW4 --configfile /SSD/local.yaml ' returned non-zero exit status 1.

Pipeline has been updated to 0.24

Hoohm commented 6 years ago

Oh, I see now. Your fastqc path is wrong. You probably used something like: /path/to/fastqcFOLDER You should have /path/to/fastqc fastqc should be the executable.

davidepisu commented 6 years ago

Oh ok, now I get the following error:

Mode is generate-plots Plotting knee plots Error in file(con, "r") : cannot open the connection Calls: yaml.load_file -> yaml.load -> paste -> readLines -> file In addition: Warning message: In file(con, "r") : cannot open file '/SSD/MLW4config.yaml': No such file or directory Execution halted Traceback (most recent call last): File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')() File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 180, in main shell(knee_plot) File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'Rscript /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Rscripts/singleCell/knee_plot.R /SSD/MLW4' returned non-zero exit status 1.

Hoohm commented 6 years ago

Hello, I know there is some error handling to do but this one is actually pretty straight forward: cannot open file '/SSD/MLW4config.yaml': No such file or directory This means you forgot the slash at the end of your -f arg. You should use -f /SSD/MLW4/ instead of -f /SSD/MLW4

davidepisu commented 6 years ago

Gotcha, I think the problems are arising from a bad configuration file anyway. Now I get the following:

Mode is generate-plots Plotting knee plots Warning message: In readLines(input, encoding = "UTF-8") : incomplete final line found on '/SSD/MLW10/config.yaml' Warning message: Removed 1425492 rows containing missing values (geom_point). Plotting base stats Loading required package: magrittr Warning message: In readLines(input, encoding = "UTF-8") : incomplete final line found on '/SSD/MLW10/config.yaml' Error in mmm < each : comparison of these types is not implemented Calls: plotRNAMetrics ... Reduce -> f -> rbind_gtable -> compare_unit -> unit -> comp Execution halted Traceback (most recent call last): File "/programs/dropSeqPipe/bin/dropSeqPipe", line 11, in load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')() File "/programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/main.py", line 182, in main shell(base_summary) File "/usr/local/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'Rscript /programs/dropSeqPipe/lib/python3.6/site-packages/dropSeqPipe/Rscripts/singleCell/rna_metrics.R /SSD/MLW10/' returned non-zero exit status 1.

My config.yaml is as follows:

Samples: MLW10: fraction: 0.001 expected_cells: 2000 GENOMEREF: /SSD/ref/genome.fa REFFLAT: /SSD/ref/annotation.refFlat RRNAINTERVALS: /SSD/ref/genome.rRNA.intervals METAREF: /SSD/ref/STAR_INDEX_NO_GTF/ GTF: /SSD/ref/annotation.gtf SPECIES:

So I don't get which lines I'm missing.....

Hoohm commented 6 years ago

@davidepisu the issue should be resolved thanks to @duyck Did it fix it for you?

Hoohm commented 6 years ago

Hello @davidepisu, could you test it out on the new version and tell me if it's fixed?

Hoohm commented 6 years ago

No response so I'll close the issue.