aryeelab / guideseq

Analysis pipeline for the GUIDE-seq assay.
GNU Affero General Public License v3.0
76 stars 53 forks source link

Test error Offtarget sites are not the same #50

Closed Zethson closed 5 years ago

Zethson commented 5 years ago

nosetests --exe -v

testFullPipeline (test.test_guideseq.FullPipelineTestCase) ... [04/03 10:19:33AM][INFO][guideseq] Loading manifest... [04/03 10:19:33AM][INFO][guideseq] Successfully loaded manifest. [04/03 10:19:33AM][INFO][guideseq] Demultiplexing undemultiplexed files... [04/03 10:19:33AM][INFO][demultiplex] Wrote FASTQs for the 2 sample barcodes out of 3 with at least 1000 reads. [04/03 10:19:33AM][INFO][guideseq] Successfully demultiplexed reads. [04/03 10:19:33AM][INFO][guideseq] umitagging reads... [04/03 10:19:33AM][INFO][guideseq] Successfully umitagged reads. [04/03 10:19:33AM][INFO][guideseq] Consolidating reads... [04/03 10:19:34AM][INFO][consolidate] Read 1615 input reads [04/03 10:19:34AM][INFO][consolidate] Wrote 943 consolidated reads [04/03 10:19:34AM][INFO][consolidate] Successfully consolidated 128623 bases out of 142393 (90.33%) [04/03 10:19:34AM][INFO][consolidate] Read 1615 input reads [04/03 10:19:34AM][INFO][consolidate] Wrote 943 consolidated reads [04/03 10:19:34AM][INFO][consolidate] Successfully consolidated 120388 bases out of 142393 (84.55%) [04/03 10:19:36AM][INFO][consolidate] Read 6000 input reads [04/03 10:19:36AM][INFO][consolidate] Wrote 2389 consolidated reads [04/03 10:19:36AM][INFO][consolidate] Successfully consolidated 328174 bases out of 360739 (90.97%) [04/03 10:19:37AM][INFO][consolidate] Read 6000 input reads [04/03 10:19:37AM][INFO][consolidate] Wrote 2389 consolidated reads [04/03 10:19:37AM][INFO][consolidate] Successfully consolidated 306721 bases out of 360739 (85.03%) [04/03 10:19:37AM][INFO][guideseq] Successfully consolidated reads. [04/03 10:19:37AM][INFO][guideseq] Aligning reads... [04/03 10:19:37AM][INFO][alignReads] Genome index files not detected. Running BWA to generate indices. [04/03 10:19:37AM][INFO][alignReads] Running bwa command: bwa index test_genome.fa [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.00 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.9a-r786 [main] CMD: bwa index test_genome.fa [main] Real time: 0.023 sec; CPU: 0.009 sec [04/03 10:19:37AM][INFO][alignReads] BWA genome index generated [04/03 10:19:37AM][INFO][alignReads] Running paired end mapping for control [04/03 10:19:37AM][INFO][alignReads] bwa mem test_genome.fa test_output/consolidated/control.r1.consolidated.fastq test_output/consolidated/control.r2.consolidated.fastq [M::main_mem] read 1886 sequences (284786 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 28, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (28, 47, 60) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 124) [M::mem_pestat] mean and std.dev: (42.17, 17.20) [M::mem_pestat] low and high boundaries for proper pairs: (1, 156) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 1886 reads in 0.351 CPU sec, 0.351 real sec [main] Version: 0.7.9a-r786 [main] CMD: bwa mem test_genome.fa test_output/consolidated/control.r1.consolidated.fastq test_output/consolidated/control.r2.consolidated.fastq [main] Real time: 0.363 sec; CPU: 0.357 sec [04/03 10:19:38AM][INFO][alignReads] Paired end mapping for control completed. [04/03 10:19:38AM][INFO][guideseq] Finished aligning reads to genome. [04/03 10:19:38AM][INFO][alignReads] BWA genome index found. [04/03 10:19:38AM][INFO][alignReads] Running paired end mapping for EMX1 [04/03 10:19:38AM][INFO][alignReads] bwa mem test_genome.fa test_output/consolidated/EMX1.r1.consolidated.fastq test_output/consolidated/EMX1.r2.consolidated.fastq [M::main_mem] read 4778 sequences (721478 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 709, 1, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (39, 80, 160) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 402) [M::mem_pestat] mean and std.dev: (110.84, 92.17) [M::mem_pestat] low and high boundaries for proper pairs: (1, 523) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 4778 reads in 0.603 CPU sec, 0.603 real sec [main] Version: 0.7.9a-r786 [main] CMD: bwa mem test_genome.fa test_output/consolidated/EMX1.r1.consolidated.fastq test_output/consolidated/EMX1.r2.consolidated.fastq [main] Real time: 0.622 sec; CPU: 0.613 sec [04/03 10:19:38AM][INFO][alignReads] Paired end mapping for EMX1 completed. [04/03 10:19:38AM][INFO][guideseq] Finished aligning reads to genome. [04/03 10:19:38AM][INFO][guideseq] Identifying offtarget sites... [04/03 10:19:38AM][INFO][identifyOfftargetSites] Processing SAM file test_output/aligned/control.sam [04/03 10:19:38AM][INFO][identifyOfftargetSites] Processing SAM file test_output/aligned/EMX1.sam [04/03 10:19:38AM][INFO][guideseq] Finished identifying offtarget sites. FAIL

====================================================================== FAIL: testFullPipeline (test.test_guideseq.FullPipelineTestCase)

Traceback (most recent call last): File "/guideseq/test/test_guideseq.py", line 104, in testFullPipeline self.assertTrue(utils.checkFolderEquality(os.path.join(TEST_OUTPUT_PATH, 'identified'), CORRECT_IDENTIFIED_OUTPUT)) AssertionError: False is not true -------------------- >> begin captured stdout << --------------------- control_identifiedOfftargets.txt does not match between folders.

--------------------- >> end captured stdout << ----------------------


Ran 1 test in 5.751s

FAILED (failures=1)

I'm using bedtools 2.25 and bwa 0.7.9a.

Zethson commented 5 years ago

Apparently control_identifiedOfftargets.txt stays empty.

Zethson commented 5 years ago

dev branch works.

Genuine feedback: please adher to best practices and always keep working and up to date code on master!