google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.33k stars 352 forks source link

sequence tagging tasks fails at metric reporting #133

Closed Joseph-Vineland closed 2 years ago

Joseph-Vineland commented 2 years ago

I am attempting the seqeunce tagging task with custom data. I formatted my data just like they did here: https://www.clips.uantwerpen.be/conll2000/chunking/

Everything runs as expected until the very end when metrics are about to be reported. Then it fails with an error. I dug into things and it appears the 'sentence_tags' array has 0 elements. I can't figure out why. Please help!

 python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["chunk"]}'

Traceback (most recent call last):
  File "run_finetuning.py", line 323, in <module>
    main()
  File "run_finetuning.py", line 319, in main
    args.model_name, args.data_dir, **hparams))
  File "run_finetuning.py", line 275, in run_finetuning
    results.append(model_runner.evaluate())
  File "run_finetuning.py", line 186, in evaluate
    return {task.name: self.evaluate_task(task) for task in self._tasks}
  File "run_finetuning.py", line 186, in <dictcomp>
    return {task.name: self.evaluate_task(task) for task in self._tasks}
  File "run_finetuning.py", line 200, in evaluate_task
    utils.log(task.name + ": " + scorer.results_str())
  File "/home/joneill/electra/finetune/scorer.py", line 54, in results_str
    for k, v in self.get_results()])
  File "/home/joneill/electra/finetune/scorer.py", line 47, in get_results
    results = self._get_results() if self._updated else self._cached_results
  File "/home/joneill/electra/finetune/tagging/tagging_metrics.py", line 112, in _get_results
    labels, self._inv_label_mapping))
  File "/home/joneill/electra/finetune/tagging/tagging_utils.py", line 38, in get_span_labels
    if sentence_tags[-1] != 'O':
IndexError: list index out of range
Joseph-Vineland commented 2 years ago

I found the answer. I set the 'is_token_level' argument in the init function call to 'True'.

Edit line 253 of tagging_tasks.py.

super(Chunking, self).__init__(config, "chunk", tokenizer, False)

change to:

super(Chunking, self).__init__(config, "chunk", tokenizer, True)