facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

Adding the embedders to bio_embeddings #6

Closed sacdallago closed 3 years ago

sacdallago commented 4 years ago

Hey folks :)

Great work!. As I mentioned on Twitter, it'd be nice to add your models to bio_embeddings. Purpose of the pipeline: make it easy for less-tech-savy bio/informatician to use protein LMs. Since you use torch, this should be quite straightforward since we already have some transformer models.

Out of the box, you get the whole "read FASTA in, make run reproducible", project, viz & embedding annotation transfer (goPredSim) pipelines. Edit: oh, and the auto-batching of large sequence files between GPU/CPU (which is not at all intuitive for the avg user), + per-sequence vs per-AA representations (looking through closed issues, #2 )

I noticed you have some variant prediction code, maybe it makes sense to include that as a pipeline step if it is sensible?

I'll link this to our issue for integration so that we can cross-follow the status: https://github.com/sacdallago/bio_embeddings/issues/62

tomsercu commented 4 years ago

Thanks Chrstian, We'll be happy to work with you folks to make this happen! The team is a bit swamped right now, but will look into this soon!

konstin commented 3 years ago

I've added ESM to bio_embeddings: https://github.com/sacdallago/bio_embeddings/blob/0b7d5f32ec7743b5fbdfcbede5bf2f0e1cbdb52b/bio_embeddings/embed/esm_embedder.py. It would be great if you could check whether the implementation looks correct to you.

Could publish ESM on pypi? I've seen that you have a proper python package (thanks for that!) and could verify with testpypi that python setup.py sdist bdist_wheel && pip instal twine && twine upload dist/* works.

sacdallago commented 3 years ago

Also: @tomsercu is your blog's gmail address still the best one to reach you? :) I sent you something ~2 days ago, just wondering :)

sacdallago commented 3 years ago

I think this is pretty much done, except we'll now also add the new model :P Congrats on that one, too.