ersilia-os / pharmacogx-embeddings

Pharmacogenomics knowledge graph embeddings and related analyses
GNU General Public License v3.0
3 stars 0 forks source link

Prompts for BioGPT embeddings #5

Closed miquelduranfrigola closed 12 months ago

miquelduranfrigola commented 1 year ago

We need to define prompts for BioGPT. Embedding extraction is already in place.

We can enumerate prompts and produce embeddings correspondingly. For example:

I suggest that we do a maximum of 5 prompts.

Potentially, we could do the same for drug molecules, although I am not entirely sure about this since I doubt that BioGPT will successfully capture SMILES strings.

miquelduranfrigola commented 12 months ago

I have now explored BioGPT sufficiently to have a sense of its validity. For now, we are using it at the variant embedding stage, which is way less risky than using them to produce training sets.

Depending on the predictive value of these embeddings (obtained, for instance, from traits in GWAS Catalogs) we will consider using BioGPT more intensively.

As a side note, we have incorporated BioGPT in the Ersilia Model Hub, with identifier eos1xje.

I am closing this issue for now, but please feel free to reopen it anytime.