RosettaCommons / protein_generator

Joint sequence and structure generation with RoseTTAFold sequence space diffusion
https://huggingface.co/spaces/merle/PROTEIN_GENERATOR
MIT License
237 stars 37 forks source link

AA compositional bias #17

Open AlexWindels opened 9 months ago

AlexWindels commented 9 months ago

Hi all,

I am currently exploring protein generator on the HuggingFace space. I am trying out the AA compositional bias conditioning and I ran the following example: 'W0.2,E0.1', with 40 diffusion steps and a protein length of 250 residues. This resulted in the following protein sequence:

AAPPPAAAVAAAAAAAPPAPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPAAAAAAAAAAAAAAAAAAAAAAAPAAAALAAAAPAPAAAAAAAPAAAVAAAAAAAAAAAAAAAAAAAAAAAPAAAPAAAAAAAAAAAAAVAAAAAAAAAAAAPAAVPAAAAAAAAAAAAAAAAAAAAAPAAAAAAAAAAAPAAAAPAAAAAAAAAAAAPAAAAAAAAAALAAAAAAAAAVA

As you can see, the sequence is almost exclusively composed out of alanines and no tryptophans or glutamic acids occur, although explicitly conditioned on these residues. When I change residues and/or bias, the results are similar and I never obtain a sequence coming close to the conditions.

Can you verify something is going wrong here?

Best,

Alex

0merle0 commented 9 months ago

Hey Alex,

I would try with a smaller number of amino acids (100 aa) or more steps (100 steps), the network often can struggle at larger lengths to generate cohesive sequence and structure pairs, if theres a more specific application you are going for here let me know and I am happy to discuss more!