dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
934 stars 284 forks source link

Why the difficult interface to specify residues? #22

Open jadolfbr opened 1 year ago

jadolfbr commented 1 year ago

I haven't yet had a chance to start editing the code (and maybe I could streamline this), but why is this needed as opposed to putting these options into the single final run script?

python ../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains

python ../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"

python ../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$fixed_positions"

python ../protein_mpnn_run.py \
        --jsonl_path $path_for_parsed_chains \
        --chain_id_jsonl $path_for_assigned_chains \
        --fixed_positions_jsonl $path_for_fixed_positions \
        --out_folder $output_dir \
        --num_seq_per_target 2 \
        --sampling_temp "0.1" \
        --seed 37 \
zhoubay commented 1 year ago

I think this is a good point from other point of view, since splited scripts could be used in other task. For instance, the parse_multiple_chains.py could be used to process PDB file in a more understandable way.

But I'm just a user of this instead of the developer of this project, my point may go wrong direction.

dauparas commented 1 year ago

The initial idea was to process inputs and save them into dictionaries in case they need to be run multiple times with different models, or with different settings. Moreover, one can easily check if parsing was done correctly for the inputs by looking at jsonl files. There is a more streamlined way to use the code too, e.g.

path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"

output_dir="../outputs/example_3_outputs"
if [ ! -d $output_dir ]
then
    mkdir -p $output_dir
fi

chains_to_design="A B"

python ../protein_mpnn_run.py \
        --pdb_path $path_to_PDB \
        --pdb_path_chains "$chains_to_design" \
        --out_folder $output_dir \
        --num_seq_per_target 2 \
        --sampling_temp "0.1" \
        --seed 37 \
        --batch_size 1
jadolfbr commented 1 year ago

That makes sense. Especially for testing. Is there a streamlined way to do positions? I wonder if a simple solution could be to run those optionally in the mpnn run script, turn the helper scripts into imported functions and go from there? Things like positions to design are common so it makes it a bit more streamlined to do that in a single script - at least for production level stuff. Right now I have a script that basically runs through all the other scripts, passing outputs along. Also working on an easier wrapper to do it in-code. It’s half-baked wirh features, but allows me to change positions and bias AA in a few lines. Looks like Sergey has some cool new colab stuff that will make that easier as well.

On Mon, Oct 17, 2022 at 12:07 AM Justas Dauparas @.***> wrote:

The initial idea was to process inputs and save them into dictionaries in case they need to be run multiple times with different models, or with different settings. Moreover, one can easily check if parsing was done correctly for the inputs by looking at jsonl files. There is a more streamlined way to use the code too, e.g.

path_to_PDB="../inputs/PDB_complexes/pdbs/3HTN.pdb"

output_dir="../outputs/example_3_outputs" if [ ! -d $output_dir ] then mkdir -p $output_dir fi

chains_to_design="A B"

python ../protein_mpnn_run.py \ --pdb_path $path_to_PDB \ --pdb_path_chains "$chains_to_design" \ --out_folder $output_dir \ --num_seq_per_target 2 \ --sampling_temp "0.1" \ --seed 37 \ --batch_size 1

— Reply to this email directly, view it on GitHub https://github.com/dauparas/ProteinMPNN/issues/22#issuecomment-1280250846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZDHRB5MAYSXXREAPEQYHDWDTGIDANCNFSM6AAAAAAQ47DYRU . You are receiving this because you authored the thread.Message ID: @.***>

ajasja commented 1 year ago

@jadolfbr I feel the problem as well. We wrap the execution in a separate script that deletes the intermediate files as well.