Closed gowthamvenkatsairam closed 4 years ago
100 questions is small you wouldn't want to do more than a few epochs. I would start with 1000 examples (10 epochs) and see how the test performance develops over time.
For such a small set you might want to reduce the steps between checkpoints.
You can change these options in run_task_main.py
:
save_checkpoints_steps=1000,
keep_checkpoint_max=5,
keep_checkpoint_every_n_hours=4.0,
For example this would give you one checkpoint per epoch:
save_checkpoints_steps=100,
keep_checkpoint_max=5,
keep_checkpoint_every_n_hours=4.0,
what about num_train_examples I think its too large
how can I log the loss for every epoch,instead of getting loss after final step
Yes, as I said I would set num_train_examples=1000
.
To get the loss and other metrics after ~ an epochs you can set save_checkpoints_steps=100
Then you can run a separate eval job to compute the loss and metrics for every checkpoint.
Will close this issue for the time being, let us know and feel free to reopen if there are any further questions.
In hparam_utils.py why is num_train_examples=200000 * 128. I have a train.tsv containing 100 questions,so,what should assign for num_train_examples