QData / DeepChrome

Bioinformatics16: DeepChrome: Deep-learning for predicting gene expression from histone modifications
http://deepchrome.net
Apache License 2.0
62 stars 14 forks source link

AUROC and dataset #6

Closed atefeh-f closed 5 years ago

atefeh-f commented 6 years ago

Hello , I have 2 questions. Do you have any dataset similar toy dataset? Because i need at any dataset similar toy dataset. When i am runing deepchrome with toy dataset then final output is:

==> time to learn 1 sample = 2.8497934341431ms
ConfusionMatrix: [[ 5 0] 100.000% [class: 1] [ 0 5]] 100.000% [class: 2]

==> time to test 1 sample = 0.88992118835449ms
ConfusionMatrix: [[ 1 1] 50.000% [class: 1] [ 8 0]] 0.000% [class: 2]

Auroc is very low! (0.4375). Why? Best Regards

rs3zz commented 6 years ago

The reason AUC score is so low is that the toy dataset has only ~10 samples so the model is not able to train well on them.

We have provided details on generating data similar to toy dataset in the paper as well as README. With larger training samples, the AUC score will improve.