Open bluegenes opened 3 years ago
Doing protein-level clustering of all 266k genomes --
time python find-founders.py -k 10 --moltype protein --siglist /home/ntpierce/2021-virus-exploration/output.protein-pigeon/compare/pigeon1.0.prodigal.siglist.txt --prefix pigeon1.0.protein-k10.mc0.05 --threshold 0.05
time: ~2.4 days real 3411m14.916s
Found 92,280
founders
92280 pigeon1.0.protein-k10.mc0.05.founders.siglist.csv
174525 pigeon1.0.protein-k10.mc0.05.members.siglist.csv
batched rarefaction:
dna-level clustering (k21
) killed after 5 days (srun issue) -- dropping mc threshold to 0.01
and restarting.
DNA k21
with mc 0.01
:
time python ../find-founders.py -k 21 --moltype DNA --siglist /group/ctbrowngrp/virus-references/pigeon/dna-input/pigeon1.0.signatures.txt --prefix pigeon1.0.dna-k21.mc0.01 --threshold 0.01
time ~3.5 days real 5030m2.497s
Found 108355
founders
108355 pigeon1.0.dna-k21.mc0.01.founders.siglist.csv
158450 pigeon1.0.dna-k21.mc0.01.members.siglist.csv
Whoops, forgot to add info on dayhoff
clustering, full pigeon database:
dayhoff k=19
, max containment 0.05
85967 pigeon1.0.dayhoff-k19.mc0.05.founders.siglist.csv
180838 pigeon1.0.dayhoff-k19.mc0.05.members.siglist.csv
Not virus, BUT, I ran this on the gtdb representative set as well, dropping here for now:
dayhoff k=19
, max containment 0.1
22674 gtdbr95rep.dayhoff-k19.mc0.1.members.siglist.csv
9236 gtdbr95rep.dayhoff-k19.mc0.1.founders.siglist.csv
protein k=10
, max containment 0.05
6097 gtdbr95rep.protein-k10.mc0.05.founders.siglist.csv
25813 gtdbr95rep.protein-k10.mc0.05.members.siglist.csv
protein k=10
, max containment 0.1
10814 gtdbr95rep.protein-k10.mc0.1.founders.siglist.csv
21096 gtdbr95rep.protein-k10.mc0.1.members.siglist.csv
Using a random subset of 21,562 protein signatures from prodigal translation of pigeon1.0 --
time python find-founders.py -k 7 --moltype protein --siglist test-sigs.siglist.txt --batch-size 5000 --prefix test.prot7.mc0.1 --threshold 0.1
clusters:
time:
real 84m25.249s
time python find-founders.py -k 7 --moltype protein --siglist test-sigs.siglist.txt --batch-size 5000 --prefix test.prot7.mc0.05 --threshold 0.05
real 42m6.182s
clusters:
time python find-founders.py -k 7 --moltype protein --siglist test-sigs.siglist.txt --batch-size 10000 --prefix test.prot7.mc0.05_bs10000 --threshold 0.05
time:
real 42m50.027s
clusters:
time python find-founders.py -k 10 --moltype protein --siglist test-sigs.siglist.txt --batch-size 5000 --prefix test.prot10.mc0.05 --threshold 0.05
time:
real 55m14.529s
clusters: