Poor/bad scores or metrics when fine-tuning

If you are seeing poor fine-tuning evaluation UAS/LAS scores, then this additional info might help.

It should take about 10 epochs to start seeing good scores coming from all the metrics, and 80 epochs to be competitive with UDPipe Future. If it's still not showing this, then there might be something off about your training.

One caveat is that if you use a subset of treebanks for fine-tuning instead of all 124 UD v2.3 treebanks, you must modify the configuration file. Make sure to tune the learning rate scheduler in your config to the number of training steps. Copy the udify_bert_finetune_multilingual.json config and modify the "warmup_steps" and "start_step" values. A good initial choice would be to set both to be equal to the number of training batches of one epoch (run the training script first to see the batches remaining).

Hyperparticle / udify

Poor/bad scores or metrics when fine-tuning #6