MBoemo / DNAscent

Software for detecting regions of BrdU and EdU incorporation in Oxford Nanopore reads.
https://www.boemogroup.org/
GNU General Public License v3.0
26 stars 13 forks source link

DNAscent: src/event_handling.cpp:547: void normaliseEvents(DNAscent::read&, bool): Assertion `et.n > 0' failed. #76

Open ZHOUXY-QH99 opened 1 week ago

ZHOUXY-QH99 commented 1 week ago

Hi , bam file and reference are ok when I use the DNAscent detect parameter, but this error occurs at runtime. Like this :

Importing reference... ok. Opening bam file... ok. Scanning bam file...ok. DNAscent: src/event_handling.cpp:547: void normaliseEvents(DNAscent::read&, bool): Assertion `et.n > 0' failed. Aborted (core dumped)

Any suggestions on what this could be? Thank you!

MBoemo commented 1 week ago

Any chance you've overwritten the minimum mapping length?

ZHOUXY-QH99 commented 1 week ago

Thank you for your reply! This is my test code, and I'm not overwrrite the minimum mapping length

~/software/Dorado/dorado-0.8.2-linux-x64/bin/dorado aligner ~/data/13_lignin_2024_10_31/07_dorado_flye/02_flye_contig/contig_96.fa ~/data/13_lignin_2024_10_31/011_dorado/barcode01/calls_2024-11-11_T03-20-56.fastq > barcode01_aligned.bam

~/software/DNAscent/bin/DNAscent index -f ~/data/13_lignin_2024_10_31/00_pod5/barcode01 -o barcode01_index.dnascent

~/software/DNAscent/bin/DNAscent detect -b barcode01_aligned.bam -r ~/data/13_lignin_2024_10_31/07_dorado_flye/02_flye_contig/contig_96.fa -i barcode01_index.dnascent -o barcode01.detect -t 20 --GPU 0

MBoemo commented 1 week ago

Can I check which commit you're using? There were a few early problems with v4.0.3 and bam parsing but these have since been fixed and the singularity image updated.

ZHOUXY-QH99 commented 1 week ago

Sorry, I didn't understand what's commit ,how can i get it ? I was using the v4.0.3 version of DNAscent, and dorado was using the latest version.

When running detect, my output file barcode01.detect is not empty.

Importing reference... ok. Opening bam file... ok. Scanning bam file...ok. [> ] 0% 18/2179 2hr53min25sec failed: 2

The program will stop at this point with the error message DNAscent: src/event_handling.cpp:547: void normaliseEvents(DNAscent::read&, bool): Assertion `et.n > 0' failed. Aborted (core dumped)

ThDef commented 6 days ago

Hi,

I just got the same issue using the v.4.0.3, pod5 files and Dorado basecalled data. The DNAscent version used was built from scratch today and the .bam file was produced by Minimap2.

[=================> ] 51% 44/86 0hr 0min14sec failed: 20
[=================> ] 51% 44/86 0hr 0min14sec failed: 20
[==================> ] 52% 45/86 0hr 0min14sec failed: 20
[==================> ] 52% 45/86 0hr 0min14sec failed: 20
DNAscent: src/event_handling.cpp:547: void normaliseEvents(DNAscent::read&, bool): Assertion `et.n > 0' failed.

/path/DNAscent_4.0.3/bin/DNAscent detect -b alignment/chr13_header.sorted.bam -r /path/genome.fasta -i DNAscent_output/index.dnascent -o DNAscent_output/chr13.detect -t 10' died with <Signals.SIGABRT: 6>

MBoemo commented 5 days ago

Shouldn't have failed so I'll take a look, but on v4.0.3 I'd recommend generating the bam file with Dorado unless there's a very good reason not to.

ZHOUXY-QH99 commented 4 days ago

Thank for your reply! I have soved this problem with changed the basecalling model. l find that DNAscent's workflow recommends Dorado basecalling model for v4.0.3 is dna_r10.4.1_e8.2_400bps_fast@v5.0.0 , so I change the model ,and detect succesfully!

MBoemo commented 4 days ago

I'm surprised that fixed it - which model were you using before?

ZHOUXY-QH99 commented 4 days ago

I was also surprised to solve this problem in this way. I used the latest version of dorado before, and the model selection is high-precision identification, as follows:

dorado basecaller hac pod5s > calls.bam

I suspected it was a coincidence, but I had no way to verify it.

@.***

From: Michael A. Boemo Date: 2024-11-15 16:18 To: MBoemo/DNAscent CC: ZHOUXY-QH99; Author Subject: Re: [MBoemo/DNAscent] DNAscent: src/event_handling.cpp:547: void normaliseEvents(DNAscent::read&, bool): Assertion `et.n > 0' failed. (Issue #76) I'm surprised that fixed it - which model were you using before? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

ThDef commented 4 days ago

Shouldn't have failed so I'll take a look, but on v4.0.3 I'd recommend generating the bam file with Dorado unless there's a very good reason not to

The .bam was not generated with Dorado because the pipeline analysis was done through a custom Snakemake that was originally made for R.9 data and the v.3 of DNAscent. It was then modified to accomodate the changes in commandline of the v.4 but the alignment part was left untouched with Minimap2.

Using Dorado to do the alignment is also not ideal when your data is only a part of a barcoded run

On my side, the basecalling was done with the superior model because part of the run was for assembly purposes but as my .bam file was then generated with Minimap2, I don't know if it is relevant ?

MBoemo commented 4 days ago

Unfortunately I haven't yet been able to reproduce this. @ThDef which version of Dorado are you using and which basecalling model?

ThDef commented 4 days ago

I used the dna_r10.4.1_e8.2_400bps_sup@v5.0.0 model with the version 0.7.2 of Dorado for the basecalling.

ZHOUXY-QH99 commented 2 days ago

As for the detection results of detect, I noticed that the probability that the thymidine is actually BrdU is very low. I would like to know what probability is considered as BrdU

ZHOUXY-QH99 commented 2 days ago

I used the dna_r10.4.1_e8.2_400bps_sup@v5.0.0 model with the version 0.7.2 of Dorado for the basecalling.

I recommend that the file format you generate during basecalling is bam file

ThDef commented 1 day ago

I tried to switch to Dorado instead of Minimap2 as the alignment tool on my already basecalled data (sup model) but the same problem occured. I will try data already basecalled with the fast model and then try to generate the .bam during the basecalling to see if anything changes.

If only the latter works, it is not ideal for use on cluster as the basecalling needs a GPU which are often on specific nodes, making pipelines/workflows more tricky to set up.

samim21 commented 3 hours ago

Hi, I've been getting the same error and I believe I've found the issue. I noticed that if there are any read ids in my bam file that are not found in my pod5 file, detect will fail as soon as it gets to that read. I was getting this error with the previous version of DNAscent as well. This has been a problem because dorado does not currently have an option to disable read splitting, so during basecalling with any model sometimes reads will be split and new read ids will be generated. These read ids are only present in the bam file (or fastq file if you output a fastq with dorado) and are not in the pod5s. I think if it is possible to have DNAscent detect skip reads that are found in the bam file but not in the pod5 file, rather than terminate completely, that would be really useful. Otherwise, I've found that it is fairly easy to first use DNascent index to generate a list of read IDs found in the pod5 file, then subset the fastq file I generate with dorado to remove any read IDs that are created by read splitting before performing alignment and DNascent detect.

MBoemo commented 2 hours ago

Thanks all - I've been away for a few days but I should have time to work on this tomorrow. That's interesting @samim21 but very useful to know. v4.0.3 should be able to gracefully deal with split reads and we haven't had any problems with that but I'll look into it and see what I can do.