Closed olgabot closed 2 years ago
Got about 1.5 figures in so far, will continue soon!
Questions/comments from inline HTML comments:
Hi!
Hi!
Hahah hello! Didn't mean to tag you there, sorry for the spam. Meant to be referencing a particular figure in the paper 😅
Made some more comments! Here are all the comments/questions/suggestions, listed out below:
Ok and some answers to you inline comments!
Question: Does the "size" of a pangenome mean number of sequences, number of genes, number of species, or something else?
That's a really good question. I think it could mean the number of distinct sequences observed across all the genomes that were looked at, but typically it refers to the number of genes.
Question: is an "open" pangenome just all the genomes in the world? Why is it that "closed" pangenomes don't increase in size?
A pangenome can be considered all of the genomes in the world (although it not really possible to exhaustively sample all genomes in the world), but the openness or closed-ness of a pangenome is a property of the eco-evo strategy of that group of organisms. Closed pangenomes are usually associated with organisms that have a very small niche breadth.
Is "genome" the preferred name of taxonomic rank over "strain" in GTDB?
Eh...it's sort of unclear. Genome is a bit more exact, as strain can have many definitions.
What does "other" mean In figure 4D? The text states that BIOML-A27 was the only strain of B. uniformis, but it seems like that is not true "other" is present
Good catch -- it's other strains from different species that get scooped in because assembly graph queries retrieve things down to about ~93% ANI. I updated the language to make this more clear.
You may be dinged on quantifying how k-mers are "fast" relative to the other option
This is probably true...I'm going to leave it for now, and if necessary I'll go back and benchmark.
The legend for metap_sfig would be clearer if shown as a table, like a 2x2 contingency table
Love this idea! Will update.
Also note I made issues #13, #14, and #15 to account for the (potentially) missing figures. There's a chance I may preprint without these figures, and then will think about adding them to the supplement before submitting for publication.
Lastly, I will either merge this branch and then update the figures (add lines of fit, t test results, and contingency table for legend) and add more details to some sections of the methods, or will push more changes to this branch.
Thank you again so much olga! your comments were 💯
Is "genome" the preferred name of taxonomic rank over "strain" in GTDB?
Eh...it's sort of unclear. Genome is a bit more exact, as strain can have many definitions.
Yes, this is something that I think Taylor and Tessa came up with and just started using, and it's SO MUCH CLEARER than using "strain"! 🎉
Like "is it a different strain or not?" Well who knows, but it's definitely a different genome sequence, so 🤷