dauparas / LigandMPNN

MIT License
238 stars 47 forks source link

feat: pip installable and w/ hydra #16

Closed YaoYinYing closed 5 months ago

YaoYinYing commented 8 months ago

Hi, this PR helps LigandMPNN being like

  1. a pip-installable package:

    # from github
    pip install git+https://github.com/YaoYinYing/LigandMPNN@pip-installable
    pip install -e  'git+https://github.com/YaoYinYing/LigandMPNN@pip-installable#egg=ligandmpnn[openfold]'
    
    # or from cloned repo
    pip install .
    pip install .[openfold]
  2. using hydra for config management: ligandmpnn/config/ligandmpnn.yaml.

    # design
    ligandmpnn input.pdb="./inputs/1BC8.pdb" output.folder="./test/default"
    
    # scoring
    ligandmpnn runtime.mode.use='score' model_type.use='ligand_mpnn' input.pdb="./outputs/ligandmpnn_default/backbones/1BC8_1.pdb" output.folder="./test/scorer" scorer.use_sequence=False sampling.number_of_batches=10 runtime.force_cpu=True
  3. download pretrained weight file automaticaly while it is used.
    ligandmpnn input.pdb="./inputs/1BC8.pdb" output.folder="./test/default" model_type.use='ligand_mpnn' checkpoint.ligand_mpnn.use='ligandmpnn_v_32_020_25'
    Seed: 42
    Device:cpu: None
    Downloading data from 'https://files.ipd.uw.edu/pub/ligandmpnn//ligandmpnn_v_32_020_25.pt' to file '/Users/yyy/Documents/protein_design/LigandMPNN/model_params/ligandmpnn_v_32_020_25.pt'.
    100%|█████████████████████████████████████| 10.5M/10.5M [00:00<00:00, 21.0GB/s]
    Mode: design
    Designing protein from this path: ./inputs/1BC8.pdb
    [2024-03-24 01:10:35,668][.prody][DEBUG] - 1356 atoms and 1 coordinate set(s) were parsed in 0.01s.
    These residues will be redesigned:  ...
  4. slightly deduplicate run.py/score.py without changing the orginal libraries (*_utils)
  5. sidechain solving
  6. customized checkpoint file/url
    
    #34
    mkdir -p customized_weight_dir_local
    curl 'https://files.ipd.uw.edu/pub/ligandmpnn/proteinmpnn_v_48_002.pt' -o customized_weight_dir_local/customized_proteinmpnn_v_48_002.pt
    ls customized_weight_dir_local
    ligandmpnn \
        sampling.seed=111 \
        weight_dir="customized_weight_dir_local" \
        checkpoint.customized.file=customized_weight_dir_local/customized_proteinmpnn_v_48_002.pt \
        input.pdb="./inputs/1BC8.pdb" \
        output.folder="./outputs/default_customozed_weight_local"

35

mkdir -p customized_weight_dir_remote ligandmpnn \ sampling.seed=111 \ weight_dir="customized_weight_dir_remote" \ checkpoint.customized.url='https://files.ipd.uw.edu/pub/ligandmpnn/proteinmpnn_v_48_020.pt' \ input.pdb="./inputs/1BC8.pdb" \ output.folder="./outputs/customized_weight_dir_remote"

ls customized_weight_dir_remote

36

mkdir -p customized_weight_dir_remote_hash ligandmpnn \ sampling.seed=111 \ weight_dir="customized_weight_dir_remote_hash" \ checkpoint.customized.url='https://files.ipd.uw.edu/pub/ligandmpnn/proteinmpnn_v_48_010.pt' \ checkpoint.customized.known_hash='md5:4255760493a761d2b6cb0671a48e49b7' \ input.pdb="./inputs/1BC8.pdb" \ output.folder="./outputs/customized_weight_dir_remote_hash"

ls customized_weight_dir_remote_hash

7. CI tests:

  - all test cases are passing in CI runners (py3.9-3.11, ubuntu): https://github.com/YaoYinYing/LigandMPNN/actions/runs/8519719844
  - all tests w/o sidechain modeling passed on ubuntu/Windows/MacOS(M1&Intel): https://github.com/YaoYinYing/LigandMPNN/actions/runs/8793510182 
  - Windows tests on sidechain failed due to OpenFold code.
    ```text
    Packing side chains...
    Traceback (most recent call last):
      File "D:\a\LigandMPNN\LigandMPNN\scripts\run.py", line 25, in main
        magician.design_proteins()
      File "C:\Users\runneradmin\miniconda3\envs\ligandmpnn\lib\site-packages\ligandmpnn\__init__.py", line 668, in design_proteins
        self.design_proteins_single(pdb=pdb)
      File "C:\Users\runneradmin\miniconda3\envs\ligandmpnn\lib\site-packages\ligandmpnn\__init__.py", line 763, in design_proteins_single
        X_stack_list,X_m_stack_list,b_factor_stack_list=self.sampling_sc(S_list)
      File "C:\Users\runneradmin\miniconda3\envs\ligandmpnn\lib\site-packages\ligandmpnn\__init__.py", line 643, in sampling_sc
        sc_dict = pack_side_chains(
      File "C:\Users\runneradmin\miniconda3\envs\ligandmpnn\lib\site-packages\ligandmpnn\sc_utils.py", line 67, in pack_side_chains
        torsion_dict = make_torsion_features(feature_dict, repack_everything)
      File "C:\Users\runneradmin\miniconda3\envs\ligandmpnn\lib\site-packages\ligandmpnn\sc_utils.py", line 212, in make_torsion_features
        xyz14_noised = feats.frames_and_literature_positions_to_atom14_pos(
      File "C:\Users\runneradmin\miniconda3\envs\ligandmpnn\lib\site-packages\openfold\utils\feats.py", line 268, in frames_and_literature_positions_to_atom14_pos
        group_mask = nn.functional.one_hot(
    RuntimeError: one_hot is only applicable to index tensor.

Feel free to edit if you find this PR is cool!

paoslaos commented 5 months ago

@dauparas are you considering merging this? This is a big improvement in terms of usability and probably many people would benefit from it!

@YaoYinYing Having an entry point in the toml would make it even nicer (I did not check but just looked at your example above!). Does your code allow to use custom checkpoints?

Thanks for great work.

Sincerly, P.

YaoYinYing commented 5 months ago

@paoslaos Hi I have update code for the feature you need and validated this change via CI tests.

Thanks for such nice advice!

paoslaos commented 5 months ago

@YaoYinYing This is truly incredible and good engineering work, thank you!

Edit: to clarify what I mean because I dont see the changes (yet?):

In the toml, a single line like:

[project.scripts]
ligandmpnn = "ligandmpnn.scripts.run:main"

would be nice. This would require the scripts dir to be under ligandmpnn which would break existing code usage. Slight duplication would not be too bad. Or slightly refactor the scripts file in root to use this entrypoint as well. What do you think?

Edit2: (Happy to help with PR)

Sincerely, P.

YaoYinYing commented 5 months ago

@paoslaos Sorry for this misunderstanding. Inference command line shortcut will be more elegant then repo cloning and script calling. LGTM. CI tests look okay too: https://github.com/YaoYinYing/LigandMPNN/actions/runs/9397959040.

paoslaos commented 5 months ago

@YaoYinYing nice!

I think we had an misunderstanding, but the result is exactly what I meant! Very nice! Will definitely use this in my work!

LGTM

YaoYinYing commented 5 months ago

This PR is outofdated and closing. Use the latest branch instead.