google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

[WIP] Define finetuning tasks in command-line hparams #53

Open mapmeld opened 4 years ago

mapmeld commented 4 years ago

While finetuning an ELECTRA model on XLNI and a movie review task, I noticed that these tasks need to be hardcoded at finetune/task_builder.py and finetune/classification/classification_tasks.py in a less-than-straightforward way

This is an outline for how I would create a StandardTSV classifier which accepts command-line arguments for a new task which follows the same format as other finetuning tasks, with a train.tsv and dev.tsv. If this makes sense to others on the repo, I would expand it to include other task types

My proposed format for the parameter is

{"newmovies": {"type": "classification", "labels":["negative", "neutral", "positive"], "header":true, "text_column":1, "label_column":2}}

I pass this configuration to a new flag --task-config which gets merged into --hparams in the code; in the final version it could make sense to add task config as a property of hparams

Sample notebook: https://colab.research.google.com/drive/14nEiOh81z89LyNC6nZyDv7rd0L2J6tII