PaddlePaddle / PaddleHelix

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
Other
1.02k stars 225 forks source link

feat/docs/cases: Covalent Bond Input #329

Open YaoYinYing opened 3 months ago

YaoYinYing commented 3 months ago

This is one of the separated PRs from #321 .

Full PR roadmap

id purpose # commits Affected
1 Hydra-Omegaconf and Pip module 6 Code, Config, Doc
2 BFD supports and MSA parallelism fixes 5 Code, Config, Case
3 Small molecule inputs and covalent bonds 4 Code, Case, Doc

Changelog

Added

Fixed

CLAassistant commented 2 months ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

alephreish commented 1 month ago

@YaoYinYing I had to bring back a couple of imports in order to make it work:

diff --git a/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py b/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py
index 603e8da..c994b0d 100644
--- a/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py
+++ b/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py
@@ -19,6 +19,7 @@ import os
 from pathlib import Path
 import pickle
 from typing import List, Mapping, Optional, Tuple
+import time

 import numpy as np
 import logging
@@ -28,7 +29,7 @@ from helixfold.data import pipeline_multimer
 from helixfold.data import pipeline_rna_multimer
 from helixfold.data import pipeline_conf_bonds, pipeline_token_feature, pipeline_hybrid
 from helixfold.data import label_utils
-
+from concurrent.futures import ProcessPoolExecutor, as_completed
 from helixfold.data.tools import utils

 from .preprocess import Entity, digit2alphabet
diff --git a/apps/protein_folding/helixfold3/inference.py b/apps/protein_folding/helixfold3/inference.py
index 429809b..9edc0a1 100644
--- a/apps/protein_folding/helixfold3/inference.py
+++ b/apps/protein_folding/helixfold3/inference.py
@@ -24,6 +24,7 @@ import pickle
 import pathlib
 import shutil
 import numpy as np
+import logging
 from helixfold.common import all_atom_pdb_save 
 from helixfold.data.pipeline_conf_bonds import load_ccd_dict
 from helixfold.model import config, utils
@@ -116,7 +117,7 @@ def resolve_bin_path(cfg_path: str, default_binary_name: str)-> str:

     raise FileNotFoundError(f"Could not find a proper binary path for {default_binary_name}: {cfg_path}.")

-def get_msa_templates_pipeline(cfg: DictConfig) -> Dict:
+def get_msa_templates_pipeline(cfg) -> Dict:
     use_precomputed_msas = True  # Assuming this is a constant or should be set globally

     template_searcher = hmmsearch.Hmmsearch(
diff --git a/apps/protein_folding/helixfold3/utils/model.py b/apps/protein_folding/helixfold3/utils/model.py
index 4a5b2d6..2ba6337 100644
--- a/apps/protein_folding/helixfold3/utils/model.py
+++ b/apps/protein_folding/helixfold3/utils/model.py
@@ -17,6 +17,7 @@
 import numpy as np
 import paddle
 import paddle.nn as nn
+import logging
 import io

 from helixfold.model import modules_all_atom

Also a side note: leave_atom_flag is currently ignored, so the user has to modify ccd_preprocessed_etkdg.pkl.gz to remove the atoms that leave upon formation of the corresponding covalent bond.

YaoYinYing commented 1 month ago

@alephreish Hi Andrey, this PR is a cherry-pick(which means there exists some poteintal bugs) from one branch that has been definitely out-of-dated. If you are looking for a full-featured branch for you project, please consider this fork.

alephreish commented 1 month ago

@YaoYinYing I've seen the main branch in your fork. I personally like the current interface of helixfold - it's flexible enough, although switching to your new interface would not be a big deal. I have the feeling that this small PR does have a chance of being merged into PaddlePaddle/PaddleHelix:dev since patch-hydra diverged too much by now.