CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
493 stars 190 forks source link

.bai not found #481

Closed bgphd closed 3 years ago

bgphd commented 3 years ago

Commands in Script

echo "Sorting and indexing the BAM file for counting the mirna occurences" samtools sort /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"-maturemiRNA-aligned-bowtie1-beststratam1.bam" --no-PG -o /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bam" samtools index /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bam" echo "Counting step" umi_tools count --method=unique --per-contig -I /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"-maturemiRNA-aligned-bowtie1-beststratam1.bam" -L /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"_counts-uniquemethod-maturemiRNA.log" -S /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"_counts-finaloutput-uniquemethod-maturemiRNA.txt" echo "Deduplicating the aligned BAM" umi_tools dedup --method=unique -I /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"-maturemiRNA-aligned-bowtie1-beststratam1.bam" -S /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"_deduplicated-matureMirna-uniquemethod-beststratam1.bam" -L /Volumes/Pegasus2\ R8\ 1/Preferred/normal_fq/analysis/$i/$i"-deduplicate-matureMirna-uniquemethod-beststratam1.log"

Terminal Output

Sorting and indexing the BAM file for counting the mirna occurences [bam_sortcore] merging from 1 files and 1 in-memory blocks... rename the index file Counting step **[E::idx_find_andload] Could not retrieve index file for '/Volumes/Pegasus2 R8 1/Preferred/normal_fq/analysis/PRE_norm_001/PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.bam' Traceback (most recent call last): File "/usr/local/Caskroom/miniconda/base/bin/umi_tools", line 11, in sys.exit(main()) File "/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/umi_tools/count.py", line 131, in main inreads = infile.fetch() File "pysam/libcalignmentfile.pyx", line 1093, in pysam.libcalignmentfile.AlignmentFile.fetch ValueError: fetch called on bamfile without index Deduplicating the aligned BAM _[E::idx_find_andload] Could not retrieve index file for '/Volumes/Pegasus2 R8** 1/Preferred/normal_fq/analysis/PRE_norm_001/PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.bam' Traceback (most recent call last): File "/usr/local/Caskroom/miniconda/base/bin/umi_tools", line 11, in sys.exit(main()) File "/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages/umi_tools/dedup.py", line 285, in main inreads = infile.fetch() File "pysam/libcalignmentfile.pyx", line 1093, in pysam.libcalignmentfile.AlignmentFile.fetch ValueError: fetch called on bamfile without index

checking for .bam and .bai

(base) bgold@Berts-MacBook-Pro PRE_norm_001 % ls -l PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.* -rw-r--r-- 1 bgold staff 47664 Jul 8 12:55 PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bai -rw-r--r-- 1 bgold staff 167181697 Jul 8 12:55 PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bam

TomSmithCGAT commented 3 years ago

It looks like your specifying the wrong input for umi_tools.

As it stands, you sort a file called [..]-beststratam1.bam and create a sorted file called [...]-beststratam1.sorted.bam, which you then index. After this, you run umi_tools on the unsorted [..]-beststratam1.bam file.

Switch the umi_tools input to the sorted file and it should all be OK.

bgphd commented 3 years ago

No, sorry that is not the answer. I have tried shuffling names many many times:

(base) bgold@Berts-MacBook-Pro PRE_norm_001 % ls -1
PRE_norm_001-UMIextraction-fromrawreads.log
PRE_norm_001-bowtie-genomeaftermiRNA-beststratam1.log
PRE_norm_001-bowtie-maturemiRNA-beststratam1.log
PRE_norm_001-deduplicate-genomeaftermiRNA-uniquemethod.log
PRE_norm_001-deduplicate-matureMirna-uniquemethod-beststratam1.log
PRE_norm_001-directUMIextracted-min18max30L.fastq
PRE_norm_001-directUMIextracted-readlengthfilter-cutadapt.log
PRE_norm_001-directUMIextracted.fastq
PRE_norm_001-genomeaftermiRNA-aligned-bowtie1-beststratam1.bam
PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.bam
PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sam
PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bai
PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bam
PRE_norm_001-maturemiRNA-unalignedReads-bowtie1-beststratam1.fastq
PRE_norm_001-tagged.bam
PRE_norm_001_counts-finaloutput-uniquemethod-maturemiRNA.txt
PRE_norm_001_counts-uniquemethod-maturemiRNA.log
PRE_norm_001_deduplicated-genomeaftermiRNA-uniquemethod.bam
PRE_norm_001_deduplicated-matureMirna-uniquemethod-beststratam1.bam

I have also tried naming: mv PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bai PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bam.bai

With no joy.

The script I am using is from this paper:

A bioinformatics approach to microRNA-sequencing analysis Pratibha Potlaa, Shabana Amanda Alic, Mohit Kapoora Osteoarthritis and Cartilage Open 3 (2021) 100131

I never ask questions until I have struggled for days...

The script from the paper is here: https://www.dropbox.com/s/2t2gqvkqsr58mjv/PotlaP_miRNA_pipeline.zip?dl=0

TomSmithCGAT commented 3 years ago

OK, well the error you posted is perfectly well explained by the absence of any index file for the input you specified. It may not be the complete answer to the problem you have been facing but it does explain the error posted and you provided literally no context in the original issue. It's always helpful if you give some more information about what you have tried.

I note you have spaces in the file path, but that shouldn't be an issue and a quick check manually suggests that's not the problem.

Could you please re-run umi_tools count on the correct input, e.g the sorted BAM PRE_norm_001-maturemiRNA-aligned-bowtie1-beststratam1.sorted.bam which has an index and include the log here.

bgphd commented 3 years ago

Wow! It worked. I am supposing the spaces in the file directory names ARE the problem!