ai4protein / ProSST

Code for ProSST: A Pre-trained Protein Sequence and Structure Transformer with Disentangled Attention.
GNU General Public License v3.0
32 stars 2 forks source link

How to use with sequence-only dataset #1

Open thanhtvt opened 3 months ago

thanhtvt commented 3 months ago

Thank you for contributing this repository to the community!

I want to use ProSST for my own dataset. Unfortunately, my dataset only contains protein sequences (e.g., "MEAIAKYDFKATADDE") and does not have pdb file or anything to obtain structural information. Is there any other way to use your repository?

Thank you very much!

ginnm commented 3 months ago

Sorry, our model has to take both structure tokens and residue tokens as input. Maybe you should utilize AlphaFold to predict the structures.

hz7-github commented 1 month ago
  I understand that the model requires both structure tokens and residue tokens as input. I plan to use AlphaFold to predict the protein structures from my sequences. However, I am unsure how to generate the required residue tokens from these predicted structures.
  Could you please provide guidance or instructions on how to generate residue tokens from the AlphaFold predicted structures? Any assistance or reference to relevant documentation would be greatly appreciated.
  Thank you very much for your help!