ARY2260 / openpom

Replication of the Principal Odor Map paper by Brian K. Lee et al. (2023).
MIT License
23 stars 28 forks source link

how to run this project #21

Open wangfei-123 opened 6 months ago

wangfei-123 commented 6 months ago

Hello author, I am currently reproducing your project, but I am not sure where the starting file is or which command code should be entered in the terminal to run. I have also seen that there is a pre trained model in this project, but how should this model be used for prediction (i.e., which instruction codes should be inputted)? Thank you!!

ARY2260 commented 6 months ago

Hello there. We recently posted a tutorial on Deepchem about openpom usage. I hope that would help you get started:

https://deepchem.io/tutorials/predict-multi-label-odor-descriptors-using-openpom/

Feel free to ask anything else. Also the benchmark scores are of an ensemble of 10 models. You can find the code for that in the examples section.

wangfei-123 commented 6 months ago

Thank you very much for your reply and answer. It has been a great help to me, and I will try my best to reproduce this project. Wishing you a happy life! Sincerely

------------------ 原始邮件 ------------------ 发件人: "ARY2260/openpom" @.>; 发送时间: 2024年2月23日(星期五) 晚上10:43 @.>; @.**@.>; 主题: Re: [ARY2260/openpom] how to run this project (Issue #21)

Hello there. We recently posted a tutorial on Deepchem about openpom usage. I hope that would help you get started:

https://deepchem.io/tutorials/predict-multi-label-odor-descriptors-using-openpom/

Feel free to ask anything else. Also the benchmark scores are of an ensemble of 10 models. You can find the code for that in the examples section.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangfei-123 commented 6 months ago

Hello author, first of all, thank you very much for sharing the project tutorial a few days ago, which deepened my understanding of the project. However, I still have a few questions to consult with you:

  1. I have found that this project (openpom-main) already includes a pre trained model (example_model. pt). Currently, I would like to directly use this model to predict the given data in the project. Which program should I call (which file is used to predict the program?), or which code should I enter on the command line.
  2. The 'inference' section of the Openpom tutorial you shared should explain how to make predictions, but I did not fully understand how to make predictions.

I am very much looking forward to your reply. Currently, I am a graduate student and starting and running this project is very important for my research. Thank you!

------------------ 原始邮件 ------------------ 发件人: "ARY2260/openpom" @.>; 发送时间: 2024年2月23日(星期五) 晚上10:43 @.>; @.**@.>; 主题: Re: [ARY2260/openpom] how to run this project (Issue #21)

Hello there. We recently posted a tutorial on Deepchem about openpom usage. I hope that would help you get started:

https://deepchem.io/tutorials/predict-multi-label-odor-descriptors-using-openpom/

Feel free to ask anything else. Also the benchmark scores are of an ensemble of 10 models. You can find the code for that in the examples section.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

ARY2260 commented 6 months ago

I am sorry for the confusion and missing docs for inference. Hope this solves the issues.

To load the model, you need to initialize the model with same parameters that were used during training of the model.

# initialize model
model = MPNNPOMModel(n_tasks = n_tasks,
                     batch_size = 128,
                     learning_rate = learning_rate,
                     class_imbalance_ratio = train_ratios,
                     loss_aggr_type = 'sum',
                     node_out_feats = 100,
                     edge_hidden_feats = 75,
                     edge_out_feats = 100,
                     num_step_message_passing = 5,
                     mpnn_residual = True,
                     message_aggregator_type = 'sum',
                     mode = 'classification',
                     number_atom_features = GraphConvConstants.ATOM_FDIM,
                     number_bond_features = GraphConvConstants.BOND_FDIM,
                     n_classes = 1,
                     readout_type = 'set2set',
                     num_step_set2set = 3,
                     num_layer_set2set = 2,
                     ffn_hidden_list = [392, 392],
                     ffn_embeddings = 256,
                     ffn_activation = 'relu',
                     ffn_dropout_p = 0.12,
                     ffn_dropout_at_input_no_act = False,
                     weight_decay = 1e-5,
                     self_loop = False,
                     optimizer_name = 'adam',
                     log_frequency = 32,
                     model_dir = './experiments',
                     device_name ='cuda')

Then you have to restore the model using ".pt" checkpoint file: model.restore("some_model.pt")

Let this be .csv file containing smiles for inference:

SMILES
CC(O)CN
CCC(=O)C(=O)O
O=C(O)CCc1ccccc1
OCc1ccc(O)cc1
O=Cc1ccc(O)cc1
O=C(O)c1ccc(O)cc1
CC(=O)O
CC=O
CC(=O)C(C)O
CC(C)=O

Now load the csv file and predict using the model

import pandas as pd
inference_csv_filepath = "infer_smiles.csv"
df = pd.read_csv(inference_csv_filepath)

# Featuize test smiles
featurizer = GraphFeaturizer()
featurized_data = featurizer.featurize(df['SMILES']) # 'SMILES' here is name of the column which contain SMILES

# Get predictions from trained model
prediction = model.predict(dc.data.NumpyDataset(featurized_data))
ARY2260 commented 6 months ago

The included pretrained model example_model. pt is just an example. It may not give good results.

wangfei-123 commented 6 months ago

Thank you very much for your answer. I have now configured the project and run it with results. Below, I will carefully understand the project code and combine it with my own requirements. I hope to improve this project and complete my own research topic. Thank you again for your answer!

------------------ 原始邮件 ------------------ 发件人: "ARY2260/openpom" @.>; 发送时间: 2024年2月28日(星期三) 晚上10:18 @.>; @.**@.>; 主题: Re: [ARY2260/openpom] how to run this project (Issue #21)

To load the model, you need to initialize the model with same parameters that were used during training of the model.

initialize model model = MPNNPOMModel(n_tasks = n_tasks, batch_size = 128, learning_rate = learning_rate, class_imbalance_ratio = train_ratios, loss_aggr_type = 'sum', node_out_feats = 100, edge_hidden_feats = 75, edge_out_feats = 100, num_step_message_passing = 5, mpnn_residual = True, message_aggregator_type = 'sum', mode = 'classification', number_atom_features = GraphConvConstants.ATOM_FDIM, number_bond_features = GraphConvConstants.BOND_FDIM, n_classes = 1, readout_type = 'set2set', num_step_set2set = 3, num_layer_set2set = 2, ffn_hidden_list = [392, 392], ffn_embeddings = 256, ffn_activation = 'relu', ffn_dropout_p = 0.12, ffn_dropout_at_input_no_act = False, weight_decay = 1e-5, self_loop = False, optimizer_name = 'adam', log_frequency = 32, model_dir = './experiments', device_name ='cuda')

Then you have to restore the model using ".pt" checkpoint file: model.restore("some_model.pt")

Let this be .csv file: SMILES CC(O)CN CCC(=O)C(=O)O O=C(O)CCc1ccccc1 OCc1ccc(O)cc1 O=Cc1ccc(O)cc1 O=C(O)c1ccc(O)cc1 CC(=O)O CC=O CC(=O)C(C)O CC(C)=O
Now load the csv file and predict using the model import pandas as pd inference_csv_filepath = "infer_smiles.csv" df = pd.read_csv(inference_csv_filepath) # Featuize test smiles featurizer = GraphFeaturizer() featurized_data = featurizer.featurize(df['SMILES']) # 'SMILES' here is name of the column which contain SMILES # Get predictions from trained model prediction = model.predict(dc.data.NumpyDataset(featurized_data))
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>