run DNAscent at read-level?

MBoemo / DNAscent

Software for detecting regions of BrdU and EdU incorporation in Oxford Nanopore reads.

https://www.boemogroup.org/

GNU General Public License v3.0

26 stars 13 forks source link

run DNAscent at read-level? #37

Closed GeoMicroSoares closed 1 year ago

GeoMicroSoares commented 1 year ago

Hi @MBoemo - what would you make of running DNAscent at read-level? Calling BrdU on reads could have a range of really interesting applications on my side - just wondering about how the software would 'react' to this. One could index the FAST5 files, then call DNAscent on the FASTA version of the reads. Do you think this is seriously flawed or do you see potential in this approach? Thanks in advance!

MBoemo commented 1 year ago

Not quite sure I understand - do you mean you want to use the basecall as the only input rather than signal + basecall + alignment as is done currently? Things like that have been done where you train on systematic errors made by the basecalling software, but I've always steered away from that as you'd have to update and retrain every time the basecaller model was updated.

GeoMicroSoares commented 1 year ago

Hi again - because DNAscent would require all those arguments I wouldn't try and skip them but instead index the reads as usual, then align the reads against themselves and input that as well as the FASTA version of the reads in to DNAscent detect. How does that sound?

MBoemo commented 1 year ago

Oh I see now. In its current state, I wouldn't use DNAscent that way. It uses the reference as a prior in order to correct for inaccuracies in the basecalling due to analogue incorporation (which, as a proportion of the code base, is actually most of the backend). If you're using a read rather than a reference, you're definitely going to get a lot more inaccuracies, although how much more is hard to say (and will be situational). It will essentially be undefined behaviour. The good news (possibly?) is that in times of strife, DNAscent tends to undercall rather than overcall, so hopefully you wouldn't see false positives everywhere but you may see a lot of strange behaviour in the analogue tracks. Might be worth revisiting when we release the R10 version due to the increase in basecalling accuracy, but my two cents is that I wouldn't go for it on R9.

GeoMicroSoares commented 1 year ago

This is great insight - thank you so much for taking the time to explain, this will help us! We're all looking forward to the R10 version of DNAscent! :)