ghoresh11 / twilight

All scripts used for the analysis of the twilight of the pan-genome, and scripts available for others to apply similar analysis
GNU General Public License v3.0
13 stars 3 forks source link

Subsampling to correct for lineage size #4

Open bananabenana opened 2 years ago

bananabenana commented 2 years ago

Hi,

Love the paper and this script is amazing. Really fascinating (and truthful way) of thinking about pan genomes!

I wanted to probe a bit deeper into the frequency table to reduce the impact of lineage size. You mention this in your paper, but would this workflow make sense to you?:

  1. In-group subsampling of groups_to_keep repeated n times and gene frequencies calculated
  2. These values then averaged
  3. Average outputted as the frequencies.csv table

I was wondering if this analysis workflow makes sense within your method for within-lineage frequency? Also, if so, I am thinking implementation could be within the ## create a vector of frequencies for each group loop or just prior as an input.

Thoughts?