Structurebiology-BNL / ESMBind

Deep learning + physical modeling for 3D protein metal ion binding prediction
Other
4 stars 2 forks source link

No ion is added by `3D_modeling/src/main.py`,caused by `add_ions.py>get_initial_ion_placements` function. #2

Closed alchemistcai closed 2 months ago

alchemistcai commented 2 months ago

I use 3D_modeling/src/main.py to generate 3D structure and initial placements for ZN is 0.

python -u ~/git_develop/ESMBind/3D_modeling/src/main.py --ion ZN --restraint_force_constant 41840 --prediction_result multi_modal_binding/results/inference/2024-09-11-18-21/predictions.pkl --pdb-dir ./ --output-dir ./

By adding two print statements in add_ions.py,it seems like that we need to convert predictions.pkl's probability into possible residues ids,but src/main.py doesn't work.residue.id[1] is int,and is not possible in predictions[id](float array).

# in src/add_ions.py>get_initial_ion_placements function:
placements = []
print(predicted_residues) # my statement
print([residue.id[1] for residue in chain]) # my statement
for residue in chain:
    if residue.id[1] in predicted_residues:
        ...

# in src/main.py>main function:
with open(prediction_result, "rb") as f:
    predictions = pickle.load(f)
temp_dir_path = "./" if debug else None
predictions = predictions[ion]
list_of_ids = list(predictions.keys())
for id in list_of_ids:
    ...
    ion_to_group = process_pdb_with_ions(
                    id,
                    "A",
                    predictions[id],
                    ion,
                    pdb_directory=pdb_dir,
                    temp_dir=temp_dir,
                    output_file=output_file,
                )

# output:
# processing sea
# Processing PDB files for sea...
# [7.1468279e-02 2.0773427e-03 2.9276037e-03 1.1450856e-03 6.8995019e-04
#  1.4301955e-03 9.6425862e-04 7.1392243e-04 6.9298875e-04 1.6153414e-03
# ...
#  1.2617731e-03 6.9831833e-03]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, # 35, ...254, 255, 256]
# Initial placements for ZN: 0
# Final # of ions for ZN: 0
# No ions were added for sea. Skipping energy minimization.
# processing see
# Processing PDB files for see...
# [5.30340560e-02 1.63454900e-03 4.48116381e-03 1.26271509e-03
# ...
#  3.97717813e-03 5.04405703e-04 1.04554314e-02 2.40233028e-03
#  1.13734277e-02]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, # 35, ... 254, 255, 256]
# Initial placements for ZN: 0
# Final # of ions for ZN: 0
# No ions were added for see. Skipping energy minimization.
alchemistcai commented 2 months ago

Other source code remains unchanged.

empyriumz commented 2 months ago

Yes, main.py expects binding residue indices instead of probabilities. I forgot to add the parsing script in the initial commit, now you can find it here: parse_dl_results.py and I have updated the README as well. Thanks for catching the error! Let me know if you run into any other issues.

alchemistcai commented 2 months ago

After generating parsed pkl file,I run src/main.py and it raises KeyError.

python $esm_bind_path/3D_modeling/src/parse_dl_results.py ZN multi_modal_binding/results/inference/2024-09-11-18-21/predictions.pkl --lower_factor 0.6

python -u $esm_bind_path/3D_modeling/src/main.py --ion ZN --restraint_force_constant 41840 --prediction_result parsed_result_predictions_ZN_lower_factor_0.60.pkl --pdb-dir ./ --output-dir ./

# Traceback (most recent call last):
#  File "/home/regen/git_develop/ESMBind/3D_modeling/src/main.py", line 140, in <module>
#   main(
#  File "/home/regen/git_develop/ESMBind/3D_modeling/src/main.py", line 25, in main
#   predictions = predictions[ion]
#                   ~~~~~~~~~~~^^^^^
KeyError: 'ZN'

I add print statements in src/main.py,predictions is {'sea': [211, 249], 'see': [138, 211, 249]}.

src/main.py>main function needs to update its interface and implementation.

empyriumz commented 2 months ago

I just updated the main.py to remove this line and it should be fixed. Note however in my test case the keys of the prediction have the structure: pdbid_chainid, e.g., 7LCI_A, you may need to modify here https://github.com/Structurebiology-BNL/ESMBind/blob/6b31680b967b1881e0375c0d0aca57ea327640ad/3D_modeling/src/main.py#L45 and subsequent places in https://github.com/Structurebiology-BNL/ESMBind/blob/6b31680b967b1881e0375c0d0aca57ea327640ad/3D_modeling/src/add_ions.py#L460 where pdb_id and chain_id are used to make your example working.