discussion points - Githubissues

From #5:

next steps: can we use kaa-mer abundance information to guess whether there are one or more strains present at a given time/in a given sample?
even using a k = 51, in the strain plots, we see a substantial fraction of sequences falling in the "other" category. most of this stuff is of the same genus as the species that is plotted, but it's not clear what that stuff is. It could be HGT, it could be that we capture things beyond 95% similarity threshold. so a next step here is to determine the organization of assembly graphs and understand what is really being returned

Other:

For strain dynamics, unclear how many strains are present at a given time; if these are single strains that change in accessory elements over time or if multiple strains are present. This information would require long reads or culture, but in principle the same analysis could be applied to these new data types.
However, interesting that for some samples, multiple bins of the same species were recovered. Potential next steps -- try to predict the number of strains in a sample using amino acid k-mer abundances.
One benefit of our method is that we recover variation attributable to a species even when it's present in sub-binning levels. E.g. recover variation even when it doesn't bin.
- example -- not that many E. bolteae bins.

dib-lab / 2021-paper-metapangenomes