fhalab / MLDE

A machine-learning package for navigating combinatorial protein fitness landscapes.
Other
122 stars 26 forks source link

single site mutation data #8

Closed xrz14 closed 1 month ago

xrz14 commented 12 months ago

If a large portion of my experimental data is single site mutation data, should I use these single site mutation data when running execute_mlde.py. In addition to the single-site mutation data, I also have some data for five-site mutations.

brucejwittmann commented 10 months ago

Hi @xrz14, sorry for the slow reply. I don't check this repo much anymore and for some reason I didn't get an email from GitHub telling me you'd raised an issue.

This may not be relevant for you anymore, but I would include the single mutation data. Note that if you primarily have single mutation data, we would expect the embeddings from the MSA transformer to be the most useful encoding method. Something like Georgiev and onehot won't have any chance of transferring information from one site to another. That being said, in my experience, the ML methods used in the MLDE package are no more effective than a naive recombinatorial approach (i.e., just make a variant with all your positive single site mutations and that one is probably going to work well).