dohlee / metheor

:comet: Ultrafast DNA methylation heterogeneity calculation from bisulfite alignments (Lee et al., PLOS Computational Biology. 2023)
GNU General Public License v3.0
41 stars 8 forks source link

thread 'main' panicked at 'index out of bounds: the len is 403 but the index is 403', src/qfdrp.rs:52:13 #10

Closed rnbatra closed 1 year ago

rnbatra commented 1 year ago

Dear Lee,

When I tested metheor qfdrp using a target bisulfite sequencing data, I encountered the following error:

metheor mhl -i test.bam -d 20 -o test.qfdrp_20.tsv

00:04:43 Processed 13170000 reads, found 8842124 valid reads.
thread 'main' panicked at 'index out of bounds: the len is 403 but the index is 403', src/qfdrp.rs:52:13

Thanks in advance.

dohlee commented 1 year ago

Hi, thanks for the report. I'll see what's going on here!

One question, is the command you executed metheor qfdrp -i test.bam -d 20 -o test.qfdrp_20.tsv (note qfdrp), but not metheor mhl -i test.bam -d 20 -o test.qfdrp_20.tsv (note mhl)?

It would be greatly helpful if you can provide the small dataset (BAM slice or even single alignment that causes the error) for testing whether the issue is resolved. Maybe the following command will create BAM slice by taking 13160000~13180000th reads from the alignment file, and it would be great if you can share that file.

$ samtools view -H [BAM] > result.sam
$ samtools view [BAM] | head -n 13180000 | tail -n 20000 >> test.sam
$ samtools view -Sb test.sam > test.bam

Thanks, Dohoon Lee

rnbatra commented 1 year ago

Thanks for your quick replyApologies, yes it is

metheor qfdrp -i test.bam -d 20 -o test.qfdrp_20.tsv

I include the result.sam and the test.bam in the test.zip file.

Thanks for having a look!

test.zip

dohlee commented 1 year ago

Hi, thank you for sharing the files, but the alignment file seems empty, it's totally my bad!

The command should have been like:

$ samtools view -H [BAM] > test.sam
$ samtools view [BAM] | head -n 13180000 | tail -n 20000 >> test.sam  # Should be '>>', please note
$ samtools view -Sb test.sam > test.bam

Could you share the alignments again if you don't mind?

By the way, I guess that this issue may have been relevant to read length, since currently the maximum read length allowed by qfdrp is 201. Are you dealing with sequencing reads longer than 201nt?

Best, Dohoon Lee

rnbatra commented 1 year ago

Sure,

Here you go.

test.bam.zip

As far as I am aware, we should go up to 151.

Thanks and best

Rajbir

dohlee commented 1 year ago

Hi, would you update metheor to v0.1.6 with conda install -c dohlee metheor=0.1.6 and try metheor qfdrp -i test.bam -d 20 -o test.qfdrp_20.tsv again? Please let me know if it still throws error!

rnbatra commented 1 year ago

Hi Dohoon, thanks for looking into this. Unfortunately, it does not work still. I also had a closer look at the test.bam that I sent earlier. Unfortunately, It did not include the offending reads.

I have corrected this. Could I request you to have a look at the included test.bam and check the issue please. Thanks so much for your help :)

test.bam.zip

dohlee commented 1 year ago

Thank you for providing the BAM file. Finally, I figured out what the issue was.

The alignment shown below was throwing an error: image

This is because the maximum allowed read length (reference span, to be specific) for (q)fdrp calculation is currently 201, but the alignment is a gapped alignment with large deletion (54D!), which makes the read span 86 + 54 + 63 = 203bp (>201) on the reference genome.

I provisonally made fdrp and qfdrp skip those reads whose reference span is larger than MAX_READ_LEN and verified it works with test.bam you gave. Do you think it is reasonable to exclude those reads (covering large deletions) from the analysis? If not, I'll try increasing MAX_READ_LEN instead.

Thank you for reporting a nice edge case. Dohoon

dohlee commented 1 year ago

Please update to version 0.1.7 and check if it works!

rnbatra commented 1 year ago

Agree, it is reasonable to skip these reads for now. Thanks a lot for your working on a quick solution :)

So good news, is that it works for this offending read but it gets stuck on another read. It throws the following error. I attach a new test.bam with the offending read.

thread 'main' panicked at 'index out of bounds: the len is 403 but the index is 18446744073709551614', src/qfdrp.rs:61:13

test.bam.zip

dohlee commented 1 year ago

Thanks for the report. Version 0.1.8 will solve this. Please check it out!

rnbatra commented 1 year ago

Thanks a lot for your efforts and the solution! Works perfectly so far!