Is there an example on how to score a set of sequences w/ a given backbone (PDB)?

dauparas / ProteinMPNN

Code for the ProteinMPNN paper

MIT License

939 stars 285 forks source link

Is there an example on how to score a set of sequences w/ a given backbone (PDB)? #9

Closed seyonechithrananda closed 1 year ago

seyonechithrananda commented 2 years ago

Hey ProteinMPNN team! Big fan of the work and thank you for the excellent guides that are runnable in Colab.

I'm trying to work through the code base + util file to better understand how to score sequences (the neg. log probability of the sequence given a backbone). My question is, given a set of sequences and a PDB file for my backbone, how can I a) featurize the backbone + b) then compute the neg. log probability for a set of sequences I have wrt to the backbone?

For reference, here is a similar example of how to do so with ESM-IF1.

Thanks for your help!

dauparas commented 2 years ago

Hey!

The easiest way would be to create many PDB files with just backbone coordinates and different sequences and then you could run the main script with --score_only 1 option.

Alternatively, I can add a feature to score sequences that are given in .fasta file against the backbone given in .pdb file. This would only work smoothly if sequences in fasta files match in number residues in pdb files for every chain and the chain ordering is given in fasta file.

seyonechithrananda commented 2 years ago

Hey!

The easiest way would be to create many PDB files with just backbone coordinates and different sequences and then you could run the main script with --score_only 1 option.

Alternatively, I can add a feature to score sequences that are given in .fasta file against the backbone given in .pdb file. This would only work smoothly if sequences in fasta files match in number residues in pdb files for every chain and the chain ordering is given in fasta file.

Thanks for your helpful reply!! I think the feature to score multiple sequences in a single FASTA against a backbone in a PDB would be incredibly helpful!! (as long as the sequences match the # of residues as you mentioned)

dauparas commented 1 year ago

There is a flag called --path_to_fasta that allows to do this.