dib-lab / 2021-paper-metapangenomes

Other
1 stars 2 forks source link

olgabot/2022 jun comments #12

Closed olgabot closed 2 years ago

olgabot commented 2 years ago
olgabot commented 2 years ago

Got about 1.5 figures in so far, will continue soon!

Questions/comments from inline HTML comments:

  1. Does the "size" of a pangenome mean number of sequences, number of genes, number of species, or something else?
  2. Is an "open" pangenome just all the genomes in the world? Why is it that "closed" pangenomes don't increase in size?
  3. Maybe add a citation for Mantel test? I had to look it up
  4. Can you add linear fit lines with $R^2 = 0.12$ and $R^2 = 0.87$ to @fig:panmers_fig A? I think it would make your point more clear. (oh wow GitHub renders LaTeX now!)
  5. Add the t-test statistic values to Figure @fig:panmers_fig B
fig commented 2 years ago

Hi!

olgabot commented 2 years ago

Hi!

Hahah hello! Didn't mean to tag you there, sorry for the spam. Meant to be referencing a particular figure in the paper 😅

olgabot commented 2 years ago

Made some more comments! Here are all the comments/questions/suggestions, listed out below:

  1. Is it possible to compute distance by mya from GTDB-tk?
  2. Is there a figure for noncoding reads in pseudogenes?
  3. Is there a figure for read error rates in coding vs noncoding?
  4. Is "genome" the preferred name of taxonomic rank over "strain" in GTDB?
  5. The legend for metap_sfig would be clearer if shown as a table, like a 2x2 contingency table
  6. What does "other" mean In figure 4D? The text states that BIOML-A27 was the only strain of B. uniformis, but it seems like that is not true "other" is present
  7. You may be dinged on quantifying how k-mers are "fast" relative to the other option
taylorreiter commented 2 years ago

Ok and some answers to you inline comments!

Question: Does the "size" of a pangenome mean number of sequences, number of genes, number of species, or something else?

That's a really good question. I think it could mean the number of distinct sequences observed across all the genomes that were looked at, but typically it refers to the number of genes.

Question: is an "open" pangenome just all the genomes in the world? Why is it that "closed" pangenomes don't increase in size?

A pangenome can be considered all of the genomes in the world (although it not really possible to exhaustively sample all genomes in the world), but the openness or closed-ness of a pangenome is a property of the eco-evo strategy of that group of organisms. Closed pangenomes are usually associated with organisms that have a very small niche breadth.

Is "genome" the preferred name of taxonomic rank over "strain" in GTDB?

Eh...it's sort of unclear. Genome is a bit more exact, as strain can have many definitions.

What does "other" mean In figure 4D? The text states that BIOML-A27 was the only strain of B. uniformis, but it seems like that is not true "other" is present

Good catch -- it's other strains from different species that get scooped in because assembly graph queries retrieve things down to about ~93% ANI. I updated the language to make this more clear.

You may be dinged on quantifying how k-mers are "fast" relative to the other option

This is probably true...I'm going to leave it for now, and if necessary I'll go back and benchmark.

The legend for metap_sfig would be clearer if shown as a table, like a 2x2 contingency table

Love this idea! Will update.

Also note I made issues #13, #14, and #15 to account for the (potentially) missing figures. There's a chance I may preprint without these figures, and then will think about adding them to the supplement before submitting for publication.

Lastly, I will either merge this branch and then update the figures (add lines of fit, t test results, and contingency table for legend) and add more details to some sections of the methods, or will push more changes to this branch.

Thank you again so much olga! your comments were 💯

ctb commented 2 years ago

Is "genome" the preferred name of taxonomic rank over "strain" in GTDB?

Eh...it's sort of unclear. Genome is a bit more exact, as strain can have many definitions.

Yes, this is something that I think Taylor and Tessa came up with and just started using, and it's SO MUCH CLEARER than using "strain"! 🎉

Like "is it a different strain or not?" Well who knows, but it's definitely a different genome sequence, so 🤷