Open ctb opened 2 years ago
maybe: "longest hash chain" at that taxon?
ok, got a better way to do a breakdown of longest hash chain for specific taxa.
Per the table above, for d__Bacteria
the longest hash chain is 364 hashes.
This hash chain is entirely part of GCA_001894475, dBacteria;pProteobacteria;cGammaproteobacteria;oEnterobacterales;fEnterobacteriaceae;gEscherichia;s__Escherichia coli,
which shares it across 165 partners - breakdown of top 10 partners and overlap below.
partner_ident | partner_lin | n_hashes | |
---|---|---|---|
0 | GCF_001481655 | dBacteria;pBacteroidota;cBacteroidia;oFlavobacteriales;fFlavobacteriaceae;gFlavobacterium;s__Flavobacterium odoratimimum | 46 |
1 | GCF_012102505 | dBacteria;pFirmicutes;cBacilli;oLactobacillales;fVagococcaceae;gVagococcus;s__Vagococcus fluvialis | 20 |
2 | GCF_003039915 | dBacteria;pFirmicutes;cBacilli;oStaphylococcales;fStaphylococcaceae;gStaphylococcus;s__Staphylococcus cohnii | 15 |
3 | GCF_009020275 | dBacteria;pBacteroidota;cBacteroidia;oBacteroidales;fBacteroidaceae;gBacteroides;s__Bacteroides uniformis | 9 |
4 | GCA_900758605 | dBacteria;pBacteroidota;cBacteroidia;oBacteroidales;fBacteroidaceae;gBacteroides;s__Bacteroides sp900552405 | 9 |
5 | GCF_013009155 | dBacteria;pFirmicutes;cBacilli;oLactobacillales;fStreptococcaceae;gStreptococcus;s__Streptococcus suis_W | 8 |
6 | GCF_003311455 | dBacteria;pFirmicutes;cBacilli;oStaphylococcales;fStaphylococcaceae;gStaphylococcus;s__Staphylococcus aureus | 7 |
7 | GCF_007293315 | dBacteria;pFirmicutes;cBacilli;oBacillales_H;fBacillaceae_D;g__Alkalihalobacillus_A;sAlkalihalobacillus_A sp007293315 | 7 |
8 | GCF_001865835 | dBacteria;pBacteroidota;cBacteroidia;oFlavobacteriales;fFlavobacteriaceae;gFlavobacterium;s__Flavobacterium odoratimimum | 6 |
9 | GCF_009648365 | dBacteria;pFirmicutes;cBacilli;oStaphylococcales;fStaphylococcaceae;gStaphylococcus;s__Staphylococcus epidermidis | 6 |
I haven't figured out what to call this, but the table below is an incomplete answer to the question:
what’s the largest collection of hashes present in a single genome that leaves you in doubt as to what taxonomic unit it comes from, per given taxon?
For example, from the table below:
d__Bacteria
.I actually can't figure out what its partner is that is in a different class than E. coli, so let me go to a different row to illustrate the partner aspect -
d__Archaea
andd__Bacteria
.in this case I'd guess it's contamination, but some of the others in the table below might not be.
Anyway, enjoy!