alexfields / ORF-RATER

Regression-based annotation of protein-coding sequences from ribosome profiling data
BSD 3-Clause "New" or "Revised" License
29 stars 12 forks source link

Prune_transcripts.py error #12

Open marcasriv opened 5 years ago

marcasriv commented 5 years ago

Hi Alex,

Thanks for making the ORF-RATE pipeline public and accessible.

When running prune_transcripts.py with the following parameters in Python 2.7.15: prune_transcripts.py --inbed annot.bed --minlen 27 --maxlen 32 genome.fa test.bam -p 10 --verbose

I am getting the following error :

[2019-09-30 11:59:51] Reading transcriptome and genome [2019-09-30 12:00:05] Parsing sequence and count information

Traceback (most recent call last): File "prune_transcripts.py", line 179, in % (len(tid_summary), lowreads_dropped, len(tid_summary)-lowreads_dropped, opts.minreads, opts.peakfrac))

ValueError: All 46526 transcripts dropped due to too few reads (46526) or too many reads coming from one position (0). Consider decreasing MINREADS (currently 64) or increasing PEAKFR(currently 0.200000), or check validity of input BAM file.

I have done as suggested in the ValueError and decreased MINREADS and increased PEAKFR, but I keep getting the same error no matter what parameters I use. I've checked the input BAM file and it seems okay too. I am quite sure that the problem isn't low coverage. When doing samtools flagstat of the BAM file I get:

203414522 + 0 in total (QC-passed reads + QC-failed reads) 100572976 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 203414522 + 0 mapped (100.00% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

Please find attached a sample of the structure of my input files: genome.fa.txt annot.bed.txt

Thank you so much,

Marina

alexfields commented 5 years ago

Apologies for the issues.

Is it possible that your input BAM file is aligned to a different reference than your BED file? For example, is the alignment in transcriptome coordinates? ORF-RATER expects genome coordinates.

Could you send a few lines from samtools view test.bam?