HeliXonProtein / proximal-exploration

PyTorch implementation for our paper "Proximal Exploration for Model-guided Protein Sequence Design"
Apache License 2.0
35 stars 12 forks source link

Why didnot the landscape of the TAPE-E4B-weight work well? #1

Open huishao007 opened 2 years ago

huishao007 commented 2 years ago

Hello author, thank you very much for sharing your code. When I was reproducing these results, I encountered some problems and I hope you will give me the right guidance. The main points of these problems are as follows:

  1. why the starting sequence of E4B is "RQSQLAQDERVSRSYLALATETVDMFHILTKQVQKPFLRPELGPRLAAMLNFNLQQLCGPKCRDLKVTNPEKYGFEPKKLLDQLTDIYLQLDCARFAKAIAD", the sequence is completely different from the WT of E4B which is "IEKFKLLAEKVEEIVAKNARAEIDYSDAPDEFRDPLMDTLMTDPVRLPSGTVMDRSIILRHLLNSPTDPFNRQMLTESMLEPVPELKEQIQAWMREKQSSDH", why did the author use such a sequence? 2, When I use TAPE-E4B-Weight to score the newly generated sequence, the result is very poor, even using the original training data, the oracle does not fit very well. I hope you can give me an answer, thank you very much!
Stilwell-Git commented 2 years ago

Hello. We update the download_landscape.sh script. The issue should be fixed now.