Training a new model - Githubissues

forrest1988 commented 5 years ago

Hi!

Problem: I was trying to train a new model and I have the following issues (FYI, before answering check out my potential cause at the bottom):

After training the output directories (specified with --model_dir and --log_dirare flags) are created, but are empty.

Best accuracy for all epoch's is 0.0. I.e. the output looks like this:

================ epoch 0 best accuracy: 0.000, best accuracy: 0.000
================ epoch 1 best accuracy: 0.000, best accuracy: 0.000
================ epoch 2 best accuracy: 0.000, best accuracy: 0.000
================ epoch 3 best accuracy: 0.000, best accuracy: 0.000
================ epoch 4 best accuracy: 0.000, best accuracy: 0.000

Sometimes, but not always, even though I am using the same input files, I also get the following error message: right before those 0.0 accuracy stats:

PATH/anaconda3/envs/deepsignal/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
  'precision', 'predicted', average, warn_for)

Commands Used to run training:

deepsignal train --train_file PATH/trainingSet.TOP1000.tsv --valid_file PATH/validationSet.TOP1000.tsv --model_dir PATH/models/testModel --log_dir PATH/models/testModel_log

Files used in the above example (download available up to two weeks from now): https://transfer.sh/PboqP/trainingSet.TOP1000.tsv https://transfer.sh/hINl0/validationSet.TOP1000.tsv

Possible reason of the problem: In the above example I was actually using files with only 1000 lines from both training and validation files, in order to see how the program behaves. Still, the same issue is true if 10k top lines are considered for each file. Of course this was only for testing, not really for creating the best model. Nevertheless, I was expecting to have some results. Is it possible that problems described at the beginning are related with a small number of samples? Are you able to reproduce it?

I would appreciate your help. Thanks, Wojciech

PengNi commented 5 years ago

Hi @forrest1988 ,

(1) There are no expected results, because deepsignal only tries to save a model after every _batchsize _displaystep (512 100 by default) samples. 51200 is greater than 1000 or 10000.

(2) The "ill-defined precision" usually appears at the beginning of the training. It may because that there are no TP and/or FP samples. It does not affect the training result if the training goes well.

Also, to get a high-performance model, we suggest that millions of samples for training, and at least 10k samples for validation.

Best, Peng

forrest1988 commented 5 years ago

Hi Peng,

Thank you a lot for your answer, the 51200 value explains everything! I think we may consider the case closed.

Best, Wojciech

bioinfomaticsCSU / deepsignal

Training a new model #10