Model training outputs many files and still runing

vahidAK commented 4 years ago

Hi @PengNi ,

It is now about 6 days that my deepsignal train model script is running on a CPU machine. I see several files in the output directory. What are these different files for? and which one would be the final trained model? BTW, the script is still running.

Many thanks, Vahid.

PengNi commented 4 years ago

Hi @vahidAK ,

In our experiments, we use at most 20 million samples for training and 10k samples for validation. We train at least 5 epoches and at most 10 epoches by default.

We save model parameters for each epoch. The files named bn_17.sn_360.epoch_{last epoch number}.ckpt* and checkpoint are used as the final model.

Cheers, Peng

vahidAK commented 4 years ago

Thanks @PengNi , Do you have a manual or sth that I can read which gives me more information about those outputs and what they represent?

Thanks

PengNi commented 4 years ago

@vahidAK , the output model files are tensorflow standard saver.save() output. You can check out our trained model and the usage of the trained model in README. e.g., for our trained model, we need the following files:

model.CpG.R9.4_1D.human_hx1.bn17.sn360/
     bn_17.sn_360.epoch_7.ckpt.data-00000-of-00001
     bn_17.sn_360.epoch_7.ckpt.index
     bn_17.sn_360.epoch_7.ckpt.meta
     checkpoint

And we use the model as follows:

deepsignal call_mods --input_path fast5s.al/ --model_path model.CpG.R9.4_1D.human_hx1.bn17.sn360/bn_17.sn_360.epoch_7.ckpt --result_file fast5s.al.CpG.call_mods.tsv --reference_path GCF_000146045.2_R64_genomic.fna --corrected_group RawGenomeCorrected_001 --nproc 10 --is_gpu no

Best, Peng

bioinfomaticsCSU / deepsignal

Model training outputs many files and still runing #34