Closed francescogianferraripini closed 4 years ago
Hi @francescogianferraripini ,
for all experiments with Italian NER we used Transformers library.
A good documentation can be found here.
The "hardest" part here is preprocessing your data. The Transformers fine-tuning script for token classification (NER, PoS tagging, chunking...) expects a "Token
After the preprocessing part you can just use a json-based configuration file, where you specify all necessary parameters/hyper-parameters, like:
{
"data_dir": ".data",
"labels": "labels.txt",
"model_name_or_path": "dbmdz/bert-base-italian-xxl-cased",
"output_dir": "bert-base-italian-xxl-cased-model-1",
"max_seq_length": 128,
"num_train_epochs": 10,
"per_device_train_batch_size": 16,
"save_steps": 703,
"seed": 1,
"do_train": true,
"do_eval": true,
"do_predict": true,
"load_best_model_at_end": true,
"fp16": true,
"overwrite_output_dir": true
}
Then you can run the examples/token_classification/run_ner.py
script and pass the json-based configuration file as first argument.
Fine-Tuning will start then, and the fine-tuned model is stored in the specified output_dir
then :)
I hope this helps!
Thanks a lot! I was more or less on that path but this greatly helps.
Hi. How can I train xxl italian model for downstream NER task?