Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
607 stars 77 forks source link

Multiple Choice Dataset Files not Overwritten #224

Closed leonardtang closed 2 years ago

leonardtang commented 2 years ago

Multiple Choice Dataset Files not Overwritten

Even having overwritten the default dataset files for the multiple-choice (in particular, RACE) task, it seems that the training scripts are still using the original RACE dataset and not my custom dataset files. As a dummy example, I'm just using the JSON file from the docs:

{
    "article": "The man walked into the red house but couldn't see where the light was.",
    "question": "What colour is the house?",
    "options": ["White", "Red", "Blue"]
    "answer": "Red"
}

To Reproduce

Steps to reproduce the behavior:

pl-transformers-train task=nlp/multiple_choice dataset=nlp/multiple_choice/race dataset.cfg.train_file=/data/leonardtang/MAUD/data/RACE_data.json dataset.cfg.validation_file=/data/leonardtang/MAUD/data/RACE_valid.json

Resulting output: Epoch 0: 0%| | 13/5798 [01:01<7:04:19, 4.40s/it, loss=1.39, train_loss=1.350]

As you can see, there are 5798 batches (size of the original RACE dataset, not the 1-example toy dataset I am testing on).

Environment

mathemusician commented 2 years ago

Hi @leonardtang! Good on ya for getting into Harvard (sorry, couldn't resist). To answer your question, you chose your dataset to be nlp/multiple_choice/race so that's what it's going to use. To use your own dataset, you need:

pl-transformers-train                                                       \
    task=nlp/multiple_choice                                                \
    dataset.cfg.train_file=/data/leonardtang/MAUD/data/RACE_data.json       \ 
    dataset.cfg.validation_file=/data/leonardtang/MAUD/data/RACE_valid.json \

I checked the docs (where your code came from), and I think they need to be updated. :[

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.