Closed xrz14 closed 1 month ago
Hi @xrz14, sorry for the slow reply. I don't check this repo much anymore and for some reason I didn't get an email from GitHub telling me you'd raised an issue.
This may not be relevant for you anymore, but I would include the single mutation data. Note that if you primarily have single mutation data, we would expect the embeddings from the MSA transformer to be the most useful encoding method. Something like Georgiev and onehot won't have any chance of transferring information from one site to another. That being said, in my experience, the ML methods used in the MLDE package are no more effective than a naive recombinatorial approach (i.e., just make a variant with all your positive single site mutations and that one is probably going to work well).
If a large portion of my experimental data is single site mutation data, should I use these single site mutation data when running execute_mlde.py. In addition to the single-site mutation data, I also have some data for five-site mutations.