Closed dw1227 closed 2 months ago
We need to control for uneven sampling of species. Our previous analysis has suggested that a decent number of species had 5 or more genome sequences. Let's see how the results change when we sample 5 genomes randomly from each species. It would be nice to modify that number in case we want to use something different (e.g. 2)
We can get lucky or unlucky in the 5 genomes that we picked for the analysis described above. Therefore, let us repeat the analysis 1000 or more times to get a more robust estimate of the overlap along with 95% confidence intervals.
If I have an ASV, what's the probability that it is also found in another taxonomic group from the same rank? For example, if I have an ASV from Bacillus subtilis, what's the probability that it is also found in Bacillus cereus? Of course, it is more likely to find a Bacillus subtilis ASV in a more closely related organism like Bacillus cereus than E. coli. We may adjust/control for relatedness later but let us now answer the general question for any two taxa from the same rank.