NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
8.45k stars 1.32k forks source link

Fine-tuning LayoutLMForSequenceClassification on RVL-CDIP.ipynb class label numbers issue #345

Open tompfs opened 10 months ago

tompfs commented 10 months ago

When I run notebook "Fine-tuning LayoutLMForSequenceClassification on RVL-CDIP.ipynb" from NielsRogge at the cell that that contains "'label': ClassLabel(names=['refuted', 'entailed'])," I get an error about the number of classes not matching. It looks like a copy-paste error. Those class labels don't look right for this task. Do you have a correction or advice?

https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForSequenceClassification_on_RVL_CDIP.ipynb#scrollTo=IeksmkWwfKjH

NielsRogge commented 10 months ago

Hi,

Thanks for reporting, looks like a bad copy paste on my side. Will fix!

You can for instance load a small subset: https://huggingface.co/datasets/jordyvl/rvl_cdip_100_examples_per_class and then get the classes as follows:

labels = dataset["train"].features["label"].feature.names
tompfs commented 10 months ago

Thanks