evo-design / evo

Biological foundation modeling from molecular to genome scale
Apache License 2.0
1.14k stars 137 forks source link

more prompt scheme documentation? #65

Closed pan-genome closed 5 months ago

pan-genome commented 5 months ago

could you provide more prompt scheme documentation beyond the one in the example colab notebook using the "greengenes-style lineage strings"? are there other possible natural language-like prompts? or even zero-shot prompting possibility?

brianhie commented 5 months ago

The only special "natural language like" prompts are the lineage strings for the 131k model only.

You can take a look at bac120_taxonomy.tsv in GTDB for a list of strings.

Some example species prompts to get started are

|d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia||
|d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella;s__Klebsiella pneumoniae||
|d__Bacteria;p__Bacillota;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus;s__Staphylococcus aureus||
|d__Bacteria;p__Tenericutes;c__Mollicutes;o__Mycoplasmatales;f__Mycoplasmataceae;g__Mycoplasma;s__Mycoplasma genitalium||