Open rcedgar opened 4 years ago
I'm 100% on board with doing this analysis. We'll have a nice data-set to do this on.
Kraken2 can do this, pretty sure!
On June 21, 2020 8:46:07 AM PDT, Artem Babaian notifications@github.com wrote:
I'm 100% on board with doing this analysis. We'll have a nice data-set to do this on.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/ababaian/serratus/issues/168#issuecomment-647144814
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
We'll probably want something closer to the Figure 2 in the Pangolin paper
@taltman Can you point me at the relevant documentation? As I understand it, the Kraken2 index doesn't store coordinates of k-mers, only taxonomy ids, so this would need to be a special feature somewhere.
Edit -- oh hang on, I see, you're suggesting we could run Kraken2 separately on each window. That might work.
@ababaian As I understand it, that figure was made by the sliding window method followed by manual (i.e. visual) analysis to identify the discontinuities. That's fine for a single genome, but not amenable to high-throughput. We could show one or two examples like that should we be successful in implementing a method.
Well we can use the inflection points between two lines to predict recombination windows right :) If we can do it by eye, we can teach a computer to do it high throughput
Yes, exactly -- the question was whether we/I need to write a new tool for this. If someone else would like to tackle this one, great!
Kraken reports the LCA of each kmer along the length of the read / contig. I can help make a custom DB of Coronavirus sequences
See this:
https://github.com/DerrickWood/kraken2/wiki/Manual
A space-delimited list indicating the LCA mapping of each k-mer in the sequence(s). For example, "562:13 561:4 A:31 0:1 562:3" would indicate that:
the first 13 k-mers mapped to taxonomy ID #562
the next 4 k-mers mapped to taxonomy ID #561
the next 31 k-mers contained an ambiguous nucleotide
the next k-mer was not in the database
the last 3 k-mers mapped to taxonomy ID #562
Note that paired read data will contain a "|:|" token in this list to indicate the end of one read and the beginning of another.
On June 21, 2020 9:52:28 AM PDT, Robert Edgar notifications@github.com wrote:
@taltman Can you point me at the relevant documentation? As I understand it, the Kraken2 index doesn't store coordinates of k-mers, only taxonomy ids, so this would need to be a special feature somewhere.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ababaian/serratus/issues/168#issuecomment-647153351
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
We're plotting percent id(%) in a sliding window, doesn't that kind of defeat the purpose of a kmer?
In theory, k-mers could work because k-mer identity correlates quite well with alignment identity. With kraken2 specifically I doubt it will work because they index only a small subset of the k-mers.
Gideon Mordecai suggests RDP4 for recombination? https://academic.oup.com/ve/article/1/1/vev003/2568683
Rob Lanfear suggests Phipack and 3seq are both good for recombination detection too. Different approaches, both powerful in their own ways.
https://www.maths.otago.ac.nz/~dbryant/software/phimanual.pdf
https://mol.ax/content/media/2018/02/3seq_manual.20180209.pdf
@ababaian mentioned that no one has been tackling this, so I'mma make an attempt if you guys don't mind!
I believe recombination events can be detected by sliding a window down a genome (or contig) to find the most similar known genomes for each window. Discontinuities in this list of top hits and their identities indicate a recombination. I think we should implement something like this for all known genomes and for new assemblies. Possibly I could implement such a tool if needed. I haven't checked for existing tools which might be able to do this, if someone could look into this & add comments here that would be great.