deepmodeling / Uni-Mol

Official Repository for the Uni-Mol Series Methods
MIT License
674 stars 119 forks source link

unimpl_tools issue about “molecule property prediction” #242

Open Golden-proteogenomics opened 3 months ago

Golden-proteogenomics commented 3 months ago

hello: I want to know this code in unimol_tools molecule property prediction `from unimol_tools import MolTrain, MolPredict clf = MolTrain(task='classification', data_type='molecule', epochs=10, batch_size=16, metrics='auc', ) pred = clf.fit(data = data)

currently support data with smiles based csv/txt file, and

custom dict of {'atoms':[['C','C],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]}

clf = MolPredict(load_model='../exp') res = clf.predict(data = data)`. This code is a api to use unimol that confuse me. The thoer question is about one function "molecule property prediction" which why have many version code to do, however, all those not description to different.

Naplessss commented 3 months ago

MolTrain is used for training models with different types of data, including SMILES-based and 3D coordinates based. For example, in bioactivity prediction, you can use docking or FEP conformations as input, which is more suitable than SMILES based. MolPredict provides prediction services using models trained with MolTrain. This means you can train your model with MolTrain and then use MolPredict for inference services.

Golden-proteogenomics commented 3 months ago

This is a error when I use this code to predict 'mol_test.csv'. The following is detail information. So, how can I do about this. 图片 python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variableTF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'> python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variableTF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'>

Naplessss commented 3 months ago

you should load model from your save_path. MolPredict(load_model='./exp')

Golden-proteogenomics commented 2 months ago

yes,the "./weights" is my models directory. 图片 . Would it be better to use the "./exp" directory based on your advice? Or is there any other advice that I haven't considered?

Naplessss commented 2 months ago

Use './weights' for the initial pretrained weights, which are the default weights provided by UniMol. For your fine-tuned model weights, use './exp'. If you only need to utilize the representation capabilities of UniMol, you can simply use UniMolRepr:

from unimol_tools import UniMolRepr
# single smiles unimol representation
clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)

if you want to train model with your own dataset, the best practice is:

  1. fit your own data with MolTrain;
  2. predict with your training model by use MolPredict load from your saving path, such as './exp' fold here.
Golden-proteogenomics commented 2 months ago

yes, I use that code ` from unimol_tools import UniMolRepr

single smiles unimol representation

clf = UniMolRepr(data_type='molecule', remove_hs=False) smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)N+[O]' smiles_list = [smiles] unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True) ` there is a error 图片 , this right?how

Naplessss commented 2 months ago

It seems the smiles is illegal for generate conformations