faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

phyluce_snp_bwa_multiple_align fails to merge BAMs - File "/anaconda2/envs/phyluce/bin/phyluce_snp_bwa_multiple_align", line 193, in <module> main() File "/anaconda2/envs/phyluce/bin/phyluce_snp_bwa_multiple_align", line 182, in main bam = picard.merge_two_bams(log, sample, sample_dir, bam, bam_se) File "/anaconda2/envs/phyluce/lib/python2.7/site-packages/phyluce/picard.py", line 124, in merge_two_bams os.remove(bam) OSError: [Errno 2] No such file or directory: #146

Closed andreluizherpeto closed 3 years ago

andreluizherpeto commented 5 years ago

Dear @brantfaircloth

 I'm running the phyluce pipeline to phase UCE data downloaded from NCBI-SRA. Because the heading of the reads present in the fastq files donwloaded from SRA did not follow the usual illumina-based format and information on the barcodes employed for sequencing the samples were not available, I was unable to run illumiprocessor. For that reason, I opted for cleaning the reads using trimmomatic. Next, I employed regex to edit the heading of each clean read, mimicking the format regularly output by illumiprocessor.

 Example: read number 5 in a clean fastq file corresponding to sample SRR1923839. Note that I employed the SRR ID in place of <instrument> and <flowcell ID>, and the number 1 in place of any other piece of information that was originally missing, such as <lane>, <tile>, etc.

@SRR1923839:1:SRR1923839:1:1:1:1 2:N:0:5 CTTGTATTTTACAGCTAGAGACTCCAGGTTTTTCCTCCTTCCATTTTAAGTAATGGGGATGTTATCAGACAAAGTTCAACAAGATAAGCAGCAAATGTGAGGTATCATGTTACAACTCCAACAAAAGAGTGGGTATTCCTTCCTCATGAACATGTGCTTCCTCATTCTTTGGAGTGTGCAGAAGAAACTGTTCAAAGTAAGCTAAGGCTCACCAGTAAGAAATGCCATCATTTGTAATTTAGCAGAATCAATTATCAGGAGAGAGTAAAATAACCCACTGAGTTATCTACAACT +5 CCCCCGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGFGGGGGGGEGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGFGFFCFGGFGGGGFGGGGGGGGGGGGFGGGGGGGGGGGGGD8DFGFFGFGGGGGGGGGGGGGGGGGGCDFGGCGGGFFFFFFFFFFFFFFFAFFFFFFFFFFEFFFFF=@D2:76ACFFFEFFFFFE

 I adopted the following directory structure:

 taxon-sets
     all
       clean-fastq
       log
       mafft-nexus-edge-trimmed-exploded
       phasing.conf

   At step "Mapping reads agains contigs" (phyluce_snp_bwa_multiple_align), I got the following error:

(phyluce) ANDREs-MacBook-Pro-4:all ALGCarvalho$ phyluce_snp_bwa_multiple_align --config phasing.conf --output multialign-bams --cores 2 --log-path log --mem 2018-12-07 22:20:46,706 - phyluce_snp_bwa_multiple_align - INFO - ============ Starting phyluce_snp_bwa_multiple_align ============ 2018-12-07 22:20:46,706 - phyluce_snp_bwa_multiple_align - INFO - Version: git fatal: Not a git repository: '/anaconda2/envs/phyluce/lib/python2.7/site-packages/.git' 2018-12-07 22:20:46,706 - phyluce_snp_bwa_multiple_align - INFO - Argument --config: /Users/ALGCarvalho/Dropbox/test-3/taxon-sets/all/phasing.conf 2018-12-07 22:20:46,706 - phyluce_snp_bwa_multiple_align - INFO - Argument --cores: 2 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - Argument --log_path: /Users/ALGCarvalho/Dropbox/test-3/taxon-sets/all/log 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - Argument --mem: True 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - Argument --no_remove_duplicates: False 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - Argument --output: /Users/ALGCarvalho/Dropbox/test-3/taxon-sets/all/multialign-bams 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - Argument --subfolder: 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - Argument --verbosity: INFO 2018-12-07 22:20:46,707 - phyluce_snp_bwa_multiple_align - INFO - ============ Starting phyluce_snp_bwa_multiple_align ============ 2018-12-07 22:20:46,712 - phyluce_snp_bwa_multiple_align - INFO - Getting input filenames and creating output directories 2018-12-07 22:20:46,713 - phyluce_snp_bwa_multiple_align - INFO - You are running BWA-MEM 2018-12-07 22:20:46,714 - phyluce_snp_bwa_multiple_align - INFO - ------------- Processing Uta_stansburiana_SRR1923839 ------------ 2018-12-07 22:20:46,714 - phyluce_snp_bwa_multiple_align - INFO - Finding fastq/fasta files 2018-12-07 22:20:46,717 - phyluce_snp_bwa_multiple_align - INFO - File type is fastq 2018-12-07 22:20:46,721 - phyluce_snp_bwa_multiple_align - INFO - Building BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:26:59,283 - phyluce_snp_bwa_multiple_align - INFO - Cleaning BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:28:14,115 - phyluce_snp_bwa_multiple_align - INFO - Adding RG header to BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:30:42,916 - phyluce_snp_bwa_multiple_align - INFO - Marking read duplicates from BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:30:45,248 - phyluce_snp_bwa_multiple_align - INFO - Building BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:31:39,637 - phyluce_snp_bwa_multiple_align - INFO - Cleaning BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:31:52,739 - phyluce_snp_bwa_multiple_align - INFO - Adding RG header to BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:32:14,904 - phyluce_snp_bwa_multiple_align - INFO - Marking read duplicates from BAM for Uta_stansburiana_SRR1923839 2018-12-07 22:32:32,062 - phyluce_snp_bwa_multiple_align - INFO - Merging BAMs for Uta_stansburiana_SRR1923839 Traceback (most recent call last): File "/anaconda2/envs/phyluce/bin/phyluce_snp_bwa_multiple_align", line 193, in main() File "/anaconda2/envs/phyluce/bin/phyluce_snp_bwa_multiple_align", line 182, in main bam = picard.merge_two_bams(log, sample, sample_dir, bam, bam_se) File "/anaconda2/envs/phyluce/lib/python2.7/site-packages/phyluce/picard.py", line 124, in merge_two_bams os.remove(bam) OSError: [Errno 2] No such file or directory: '/Users/ALGCarvalho/Dropbox/test-3/taxon-sets/all/multialign-bams/Uta_stansburiana_SRR1923839/Uta_stansburiana_SRR1923839-CL-RG-MD.bam'

   In my "multialign-bams" output folder, only 13 files were generated (instead of the expected 16):

Uta_stansburiana_SRR1923839-se-CL-RG-MD.bam Uta_stansburiana_SRR1923839.pe.samtools-view-out.log Uta_stansburiana_SRR1923839.se.picard-clean-out.log Uta_stansburiana_SRR1923839.pe.bwa-sampe-out.log Uta_stansburiana_SRR1923839.picard-merge-out.log Uta_stansburiana_SRR1923839.se.picard-metricsfile.txt Uta_stansburiana_SRR1923839.pe.picard-MD-out.log Uta_stansburiana_SRR1923839.se.bwa-sampe-out.log Uta_stansburiana_SRR1923839.se.samtools-view-out.log Uta_stansburiana_SRR1923839.pe.picard-RG-out.log Uta_stansburiana_SRR1923839.se.picard-MD-out.log Uta_stansburiana_SRR1923839.pe.picard-clean-out.log Uta_stansburiana_SRR1923839.se.picard-RG-out.log

   Do you have any idea of what could be possibly causing this error? I am using both Ubuntu and OSx. The same error pops up independently of the system.

   I thank you in advance for any help provided. 

   Best. A.
brantfaircloth commented 5 years ago

The clue may be in some of these log files. Whether or not you used illumiprocessor should not be a huge issue, but I'm not totally sure what bug you've run into that may or may not be related to picard.

gustavo-miranda commented 5 years ago

Hi, I am running phyluce_snp_bwa_multiple_align in phyluce v.1.6.7, but I'm constantly getting an error that I can't find the solution. The folder structure of the cleaned reads follows precisely that of illumiprocessor.

The log file says: File "PATH/phyluce/1.6.7/miniconda/bin/phyluce_snp_bwa_multiple_align", line 193, in <module> main() File "PATH/phyluce/1.6.7/miniconda/bin/phyluce_snp_bwa_multiple_align", line 161, in main bam = picard.add_rg_header_info(log, sample, sample_dir, fc, bam, "pe") File "PATH/bioinformatics/phyluce/1.6.7/miniconda/lib/python2.7/site-packages/phyluce/picard.py", line 102, in add_rg_header_info os.remove(bam) OSError: [Errno 2] No such file or directory: 'PATH/AnalysisJan2019/SNP-multialign-bams/souzai220/souzai220-CL.bam'

It points out that there is no souzai220-CL.bam in the specified folder. So I checked the log files in the multialign-bams folder and found the following error message in the .pe.picard-clean-out.log file: Exception in thread "main" java.lang.IllegalArgumentException: Cannot add sequence that already exists in SAMSequenceDictionary: souzai220 Because of this error, the program did not output the -CL.bam file for further analysis.

What does it mean the sequence already exists in SAMSequenceDictionary? Would anyone have an input on how to fix this?

Thanks Gustavo

Deco313 commented 4 years ago

Bumping this issue as I am experiencing the same thing.

brantfaircloth commented 4 years ago

It looks like whatever is making the sequence dictionary (the "index" or "reference" sequence) may contain two contigs with the same name...