gifford-lab / DeepLigand

MIT License
17 stars 12 forks source link

Train and Test dataset partition #5

Open hezt opened 4 years ago

hezt commented 4 years ago

Hello,

Could you please illustrate how to partite the train set and the test set from CV files (http://gerv.csail.mit.edu/deepligand_CVdata/) to get the evaluation performance curve depicted in your paper?

I'm trying to reimplement your train and evaluate processes.

Thanks, Zitong

hezt commented 4 years ago

Hello,

Moreover, whether you concatenate the prediction results on each fold, where the model was trained on the other 4 folds, to draw auROC and auPRC curves?

Best, Zitong

haoyangz commented 4 years ago

@hezt For each of the five folds, we trained one model using the other four folds before using it to predict on this fold. The resulting predictions of the five folds were concatenated to calculate auROC and other metrics.

KiAkize commented 3 years ago

Hello,

I am also trying to retrain 5cv models to reimplement results.

Could you please illustrate what each column of the downloaded 5CV data means?

In addition, besides renaming MHC names to the format in the MHC_pseudo.dat, what else needs to be done before using preprocess.py to transform training data?

Thank you!