dincarnato / RNAFramework

RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis
http://www.rnaframework.com
GNU General Public License v3.0
31 stars 11 forks source link

minimap2 BAM compatibility? - Undefined value as an ARRAY reference : Line 614 #34

Closed physnano closed 1 year ago

physnano commented 1 year ago

Hi Danny,

New user to RNAFramework, thanks for developing. I am trying to run on (long) reads that have been aligned/sorted using minimap2 and am encountering the following issues:

[+] Making output directory... [+] Guessing file types:

Sample Type 5'-end trimming

1_align-sorted BAM Ignored 2_align-sorted BAM Ignored

[+] Getting transcripts from reference, and building count table base structure... [+] Inspecting SAM/BAM file headers... [+] Assuming that provided SAM/BAM files are already sorted. Skipping sorting... [+] Calculating per-base mutation counts and coverage. This may take a while...

[-] Processing sample "1_align-sorted" (PID: 9203) [-] Processing sample "2_align-sorted" (PID: 9205)

[+] Statistics:

[!] Error: Unable to copy counts table structure for sample "2_align-sorted" (No such file or directory)

It seems to be failing on line 614: Can't use an undefined value as an ARRAY reference at .../bin/RNAFramework/rf-count line 614.

My guess would be it doesn't like the BAM files from minimap2. Any assistance much appreciated.

Thanks Will

dincarnato commented 1 year ago

Hi Will,

Can you help me reproduce the issue by sharing the command line, BAM file (or part of it) and the reference FASTA file?

Thanks! Best,

Danny

physnano commented 1 year ago

Hi Danny,

I was able to narrow the issue down to a -om "T2C" option I was passing. Which is related to an additional issue I am now having. I now am running run rf-count with the -orc option however when I look at the raw_counts, I am only seeing insertions in the output. For context the data input is direct RNA nanopore reads, so I would expect many substitutions as well. I have attached the reference, and downsampled fastqs, and sam/bam files from minimap2. rf_test_data.zip

Thanks for your help

-Will

dincarnato commented 1 year ago

Can you give me the exact command you are executing?

dincarnato commented 1 year ago

Will,

I realized the issue. It looks like your BAM files miss one tag, the "MD" tag, which provides information on where the substitutions reside. You might want to run a samtools calmd --output-fmt=BAM 2ds_align-sorted.bam rep_mrnas.fasta > 2ds_align-sorted_reMD.bam prior to running rf-count. If you do that, the issue should be gone.

I have now also added to rf-count a warning to indicate whether the MD tag is missing. Please issue a git pull.

Let me know if this fixes. All the best,

Danny

physnano commented 1 year ago

Hi Danny,

Thanks so much everything looks great now!

Best,

Will