MatthewRalston / kmerdb

Python bioinformatics CLI for k-mer counts and de Bruijn graphs
https://matthewralston.github.io/kmerdb
Apache License 2.0
12 stars 1 forks source link

Solution engineering for disease model mutational model discrete probabilities. What... #144

Closed MatthewRalston closed 4 months ago

MatthewRalston commented 4 months ago

Okay, so it would be intelligent to pick some diseases to develop expertise at, and it has to be at multiple levels, and it's gotta be both amino acid and nucleic acid sequences. I have experience that looks good in the alignment-free space, and I don't really have a plan to make an assembler. But I would play around with how the assembly solution gets chosen, audited, and determined to be covered in a sufficiently complex way, as to be clear with as much complexity as short-read lengths buys us. And people say "well what about nanopore and read-length trends" and it isnt just about that. It's about coverage, reliable, random, voluminous, coverage. That's the gravy so if you can believe that, Once you're at a coverage maximum that is proportional to read-length by virtue of local read complexity, you're stuck at a few hundred, and what if your measurement, protocol, or disease model of variance that can let that level of baseline occur with or without adequate controls for RNA quality, ERCC ladders, and other models of sensitivity, then you have to have reliable coverage throughout the contig space you're able to navigate throughout the data.