Low AUROC Value - BBB_Martins

Hi,

First of all, I would like to congratulate you for the work you have done.

I am attempting to achieve an AUROC value of 0.920 for the BBB_Martins dataset, as reported in the paper.

When I run the code below, I get the highest value of 0.564. Could you identify any errors or guide me on where my approach might be incorrect?

group = admet_group(path='data/')

# Specify the dataset name
name = 'BBB_Martins'

# Load the BBB_Martins dataset
benchmark = group.get(name)
train_val, test = benchmark['train_val'], benchmark['test']
y_test = np.array(test.Y)

# Create dictionaries for y_train and y_valid for different seeds
y_train_dict = {}
y_valid_dict = {}

for seed in [1, 2, 3, 4, 5]:
    train, valid = group.get_train_valid_split(benchmark=name, split_type='default', seed=seed)
    y_train_dict[seed] = train.Y
    y_valid_dict[seed] = valid.Y

predictions_val_xgb = [np.random.rand(len(y_valid_dict[seed])) for seed in range(1, 6)]  # Replace with actual predictions
predictions_val_rf = [np.random.rand(len(y_valid_dict[seed])) for seed in range(1, 6)]   # Replace with actual predictions
predictions_val_svm = [np.random.rand(len(y_valid_dict[seed])) for seed in range(1, 6)]  # Replace with actual predictions

# Convert validation predictions for each model into a DataFrame
# Assuming predictions_val_xgb, predictions_val_rf, predictions_val_svm are lists of arrays for each seed
df_val_xgb = pd.DataFrame(predictions_val_xgb).transpose()  # Each column represents predictions for one seed
df_val_rf = pd.DataFrame(predictions_val_rf).transpose()
df_val_svm = pd.DataFrame(predictions_val_svm).transpose()

# Convert test predictions for each model into a DataFrame
# Test predictions are single arrays since there's only one test set
df_test_xgb = pd.DataFrame(predictions_test_xgb, columns=['test'])
df_test_rf = pd.DataFrame(predictions_test_rf, columns=['test'])
df_test_svm = pd.DataFrame(predictions_test_svm, columns=['test'])

# Now, create val_dfs_list and test_dfs_list with these DataFrames
val_dfs_list = [df_val_xgb, df_val_rf, df_val_svm]
test_dfs_list = [df_test_xgb, df_test_rf, df_test_svm]    

model_names = ['xgb', 'rf', 'svm'] # mention model names 
preds = cfafunctions.model_predictions(
    len(model_names),
    model_names,
    val_dfs_list=val_dfs_list,
    test_dfs_list=test_dfs_list
)

# Accessing the second element of the preds tuple for test predictions
test_predictions_dict = preds[1]

xgb_test_predictions = test_predictions_dict['predictions_test_xgb']
rf_test_predictions = test_predictions_dict['predictions_test_rf']
svm_test_predictions = test_predictions_dict['predictions_test_svm']

xgb_test_prob_positive = np.array(xgb_test_predictions[0])
rf_test_prob_positive = np.array(rf_test_predictions[0])
svm_test_prob_positive = np.array(svm_test_predictions[0])

auc_xgb = roc_auc_score(y_test, xgb_test_prob_positive)
print(f'AUC for XGBoost: {auc_xgb:.3f}')

# Calculating AUC for Random Forest
auc_rf = roc_auc_score(y_test, rf_test_prob_positive)
print(f'AUC for Random Forest: {auc_rf:.3f}')

# Calculating AUC for SVM
auc_svm = roc_auc_score(y_test, svm_test_prob_positive)
print(f'AUC for SVM: {auc_svm:.3f}')

RESULT

F-LIDM / CFA4DD

Low AUROC Value - BBB_Martins #4