Closed lairikeqiA closed 4 years ago
Can you share some details? Such as the command you are running. Since you train on GPU I suspect you are using a smaller batch size? How many updates has the model performed (global_step * batch_size)?
In the experiments I ran the model reached at least 41% accuracy after 5000 steps (at batch size 512 ~= 2.5M updates) for all random runs.
That is my command:python3 run_task_main.py \ --task=WTQ \ --output_dir=/mnt/cjc/tapas-master/WTQ --model_dir=/mnt/cjc/tapas-master/tapas_wikisql_sqa_masklm_large \ --init_checkpoint=/mnt/cjc/tapas-master/tapas_wikisql_sqa_masklm_large/model.ckpt \ --bert_config_file=/mnt/cjc/tapas-master/tapas_wikisql_sqa_masklm_large/bert_config.json \ --mode=train --use_tpu=False --iterations_per_loop=5 --train_batch_size=4 --max_seq_length=512 I ran the model reached 30% accuracy after 21000 steps(at batch size 4~=84000 updates).
I have another question. Does it shuffle the test dataset of WTQ on tapas_wtq_wikisql_sqa_masklm_large model?
It should not shuffle the test data, only train.
Batch size 4 is pretty small, I would assume that you will not get the full accuracy with that batch size.
With 84,000 updates you are at 0.34% of what we usually have so you might have to wait a bit more.
How is the accuracy developing as a function the global steps?
The model converges faster in early training than in late training.
@lairikeqiA indeed that is expected the learning rate decays according to tf.train.polynomial_decay and the use of the AdamOptimizer.
I have been fining tune the program of tapas_wikisql_sqa_masklm_large on WTQ dataset to get the tapas_wtq_wikisql_sqa_masklm_large model for a few days on GPU. But the dev acc only is 30%. What is the problem about this phenomenon?