Closed zhuzihan728 closed 1 year ago
Short answer: yes, that's technically possible. Long answer: the quality of the embedding will heavily depend on the length of your fragment. For example, some PDB-sequences are length-wise rather on the fragment- than on the protein-side. Those are usually still e.g. ~50 residues long. If you talk about fragments of length 5, I am skeptical how much information you still get from those.
But I guess it would be best to just try it as the numbers I give above (5 and 50) are also just examples and we have no precise idea how small a fragment can be to still give reasonable embeddings. If you got some results on this it would be great if you could share them here at one point :)
Will the resulting embeddings make sense if the model only sees fragments of a protein sequence instead of the whole sequence?