2 problems, thank U!
1, Training problem
Training the model with ensemble_benchmark.ipynb
using curated_GS_LF_merged_4983.csv as train dataset
modify the line 'train_dataset, test_dataset = splitter.train_test_split(dataset, frac_train=0.8, train_dir='./splits/train_data', test_dir='./splits/test_data')'
let frac_train=0.9 (instead of frac_train=0.8)
ERROR
'Traceback (most recent call last):
File "/home/ubuntu/openpom/examples/benchmark2.py", line 132, in
test_scores = model.evaluate(test_dataset, [metric])['roc_auc_score']
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/models/models.py", line 219, in evaluate
return evaluator.compute_model_performance(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/utils/evaluate.py", line 315, in compute_model_performance
results = metric.compute_metric(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/metrics/metric.py", line 650, in compute_metric
metric_value = self.compute_singletask_metric(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/metrics/metric.py", line 726, in compute_singletask_metric
metric_value = self.metric(y_true_arr, y_pred_arr, *kwargs)
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
return func(args, **kwargs)
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_ranking.py", line 648, in roc_auc_score
return _average_binary_score(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_base.py", line 118, in _average_binary_score
score[c] = binary_metric(y_true_c, y_score_c, sample_weight=score_weight)
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_ranking.py", line 382, in _binary_roc_auc_score
raise ValueError(
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.'
How frac_train=0.9 made the ERROR?
2, Inference problem
I had trained the model with n_models = 10 and nb_epoch = 62 (By the way, what value of nb_epoch is best?)
model.restore(f"./ensemble_models/experiments_1/checkpoint2.pt")
model.restore(f"./ensemble_models/experiments_10/checkpoint2.pt")
So, There is a significant difference between the Inference results with experiments_1/checkpoint2.pt and experiments_10/checkpoint2.pt.
For example, the 5 odors with top 5 high Inference values and 138 values
OC12C3CC3C4CC(CCC41C)C2(C)C ['woody', 'green', 'amber', 'camphoreous', 'dry']
2 problems, thank U! 1, Training problem Training the model with ensemble_benchmark.ipynb using curated_GS_LF_merged_4983.csv as train dataset modify the line 'train_dataset, test_dataset = splitter.train_test_split(dataset, frac_train=0.8, train_dir='./splits/train_data', test_dir='./splits/test_data')' let frac_train=0.9 (instead of frac_train=0.8) ERROR 'Traceback (most recent call last): File "/home/ubuntu/openpom/examples/benchmark2.py", line 132, in
test_scores = model.evaluate(test_dataset, [metric])['roc_auc_score']
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/models/models.py", line 219, in evaluate
return evaluator.compute_model_performance(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/utils/evaluate.py", line 315, in compute_model_performance
results = metric.compute_metric(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/metrics/metric.py", line 650, in compute_metric
metric_value = self.compute_singletask_metric(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/deepchem/metrics/metric.py", line 726, in compute_singletask_metric
metric_value = self.metric(y_true_arr, y_pred_arr, *kwargs)
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
return func(args, **kwargs)
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_ranking.py", line 648, in roc_auc_score
return _average_binary_score(
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_base.py", line 118, in _average_binary_score
score[c] = binary_metric(y_true_c, y_score_c, sample_weight=score_weight)
File "/home/ubuntu/.conda/envs/openpom/lib/python3.9/site-packages/sklearn/metrics/_ranking.py", line 382, in _binary_roc_auc_score
raise ValueError(
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.'
How frac_train=0.9 made the ERROR?
2, Inference problem I had trained the model with n_models = 10 and nb_epoch = 62 (By the way, what value of nb_epoch is best?) model.restore(f"./ensemble_models/experiments_1/checkpoint2.pt") model.restore(f"./ensemble_models/experiments_10/checkpoint2.pt") So, There is a significant difference between the Inference results with experiments_1/checkpoint2.pt and experiments_10/checkpoint2.pt. For example, the 5 odors with top 5 high Inference values and 138 values OC12C3CC3C4CC(CCC41C)C2(C)C ['woody', 'green', 'amber', 'camphoreous', 'dry']
OC12C3CC3C4CC(CCC41C)C2(C)C ['spicy', 'earthy', 'herbal', 'woody', 'green']
Which model is better and how to get the best one? Is it an overfitting problem caused by a small training dataset?