ARY2260 / openpom

Replication of the Principal Odor Map paper by Brian K. Lee et al. (2023).
MIT License
23 stars 28 forks source link

Problems in the training and inference #23

Open Tiger2Wings opened 5 months ago

Tiger2Wings commented 5 months ago

2 problems, thank U! 1, Training problem Training the model with ensemble_benchmark.ipynb using curated_GS_LF_merged_4983.csv as train dataset modify the line 'train_dataset, test_dataset = splitter.train_test_split(dataset, frac_train=0.8, train_dir='./splits/train_data', test_dir='./splits/test_data')' let frac_train=0.9 (instead of frac_train=0.8) ERROR 'Traceback (most recent call last): File "/home/ubuntu/openpom/examples/benchmark2.py", line 132, in test_scores = model.evaluate(test_dataset, [metric])['roc_auc_score'] File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/models/models.py", line 219, in evaluate return evaluator.compute_model_performance( File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/utils/evaluate.py", line 315, in compute_model_performance results = metric.compute_metric( File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/metrics/metric.py", line 650, in compute_metric metric_value = self.compute_singletask_metric( File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/metrics/metric.py", line 726, in compute_singletask_metric metric_value = self.metric(y_true_arr, y_pred_arr, *kwargs) File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper return func(args, **kwargs) File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_ranking.py", line 648, in roc_auc_score return _average_binary_score( File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_base.py", line 118, in _average_binary_score score[c] = binary_metric(y_true_c, y_score_c, sample_weight=score_weight) File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_ranking.py", line 382, in _binary_roc_auc_score raise ValueError( ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.'

How frac_train=0.9 made the ERROR?

2, Inference problem I had trained the model with n_models = 10 and nb_epoch = 62 (By the way, what value of nb_epoch is best?) model.restore(f"./ensemble_models/experiments_1/checkpoint2.pt") model.restore(f"./ensemble_models/experiments_10/checkpoint2.pt") So, There is a significant difference between the Inference results with experiments_1/checkpoint2.pt and experiments_10/checkpoint2.pt. For example, the 5 odors with top 5 high Inference values and 138 values OC12C3CC3C4CC(CCC41C)C2(C)C ['woody', 'green', 'amber', 'camphoreous', 'dry']

OC12C3CC3C4CC(CCC41C)C2(C)C ['spicy', 'earthy', 'herbal', 'woody', 'green']

Which model is better and how to get the best one? Is it an overfitting problem caused by a small training dataset?