dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
910 stars 278 forks source link

Retrieve per-position scores or score a chain in the context of another #84

Open amin-sagar opened 7 months ago

amin-sagar commented 7 months ago

Hello. I am trying to score multiple small binder sequences to a large target. I have used the example script for scoring as follows

path_to_pdb="./Structure.pdb"
path_to_fasta="./sequences.fasta"

output_dir="./outputs/"
if [ ! -d $output_dir ]
then
    mkdir -p $output_dir
fi

#path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
#path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
chains_to_design="A B"

python /home/amin/softwares/protein-design/ProteinMPNN/protein_mpnn_run.py \
        --path_to_fasta $path_to_fasta \
        --pdb_path $path_to_pdb \
        --pdb_path_chains "$chains_to_design" \
        --out_folder $output_dir \
        --score_only 1

The sequences are in the format AAAAA/BBBBBB The scores that I get are very similar. I think this is because the target is so large compared to the binder that the scores of the binder contribute very little to the mean score. I have two questions. 1) If I change

chains_to_design="A B"

to

chains_to_design="B"

Does it mean that mpnn will score the chain B in isolation and compute the probability of sequence B folding into the structure of chain B with no consideration of chain A or is the chain A considered and just the scores are accumulated for chain B? 2) Is there a way to get per position scores? This would allow be to calculate the mean of the scores for chain B only. I would really appreciate any suggestions. Best, Amin.