Love the paper and this script is amazing. Really fascinating (and truthful way) of thinking about pan genomes!
I wanted to probe a bit deeper into the frequency table to reduce the impact of lineage size. You mention this in your paper, but would this workflow make sense to you?:
In-group subsampling of groups_to_keep repeated n times and gene frequencies calculated
These values then averaged
Average outputted as the frequencies.csv table
I was wondering if this analysis workflow makes sense within your method for within-lineage frequency?
Also, if so, I am thinking implementation could be within the ## create a vector of frequencies for each group loop or just prior as an input.
Hi,
Love the paper and this script is amazing. Really fascinating (and truthful way) of thinking about pan genomes!
I wanted to probe a bit deeper into the frequency table to reduce the impact of lineage size. You mention this in your paper, but would this workflow make sense to you?:
groups_to_keep
repeated n times and gene frequencies calculatedI was wondering if this analysis workflow makes sense within your method for within-lineage frequency? Also, if so, I am thinking implementation could be within the
## create a vector of frequencies for each group
loop or just prior as an input.Thoughts?