Closed pan-genome closed 5 months ago
The only special "natural language like" prompts are the lineage strings for the 131k model only.
You can take a look at bac120_taxonomy.tsv in GTDB for a list of strings.
Some example species prompts to get started are
|d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia||
|d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella;s__Klebsiella pneumoniae||
|d__Bacteria;p__Bacillota;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus;s__Staphylococcus aureus||
|d__Bacteria;p__Tenericutes;c__Mollicutes;o__Mycoplasmatales;f__Mycoplasmataceae;g__Mycoplasma;s__Mycoplasma genitalium||
could you provide more prompt scheme documentation beyond the one in the example colab notebook using the "greengenes-style lineage strings"? are there other possible natural language-like prompts? or even zero-shot prompting possibility?