allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

RoBERTa on SuperGLUE's 'Choice of Plausible Alternatives' task #5001

Open dirkgr opened 3 years ago

dirkgr commented 3 years ago

COPA is one of the tasks of the SuperGLUE benchmark. The task is to re-trace the steps of Facebook's RoBERTa paper (https://arxiv.org/pdf/1907.11692.pdf) and build an AllenNLP config that reads the COPA data and fine-tunes a model on it. We expect scores in the range of their entry on the SuperGLUE leaderboard.

This can be formulated as a multiple choice task, using the TransformerMCTransformerToolkit model, analogous to the PIQA model. You can start with the experiment config and dataset reading step from PIQA, and adapt them to your needs.

ekdnam commented 3 years ago

Hi! Can I work on this issue?

dirkgr commented 3 years ago

Absolutely! Do you know how to get started?

ekdnam commented 3 years ago

Absolutely!

That is great!

Do you know how to get started?

I would be very glad if you can give some pointers. Currently, I am going through the links given in the issue (AllenNLP Repository Template and others).

dirkgr commented 3 years ago

Most of the work will be in writing the reader, which makes Instance objects out of the data from disk.

The CoPA data looks like this:

Premise: The man broke his toe. What was the CAUSE of this? Alternative 1: He got a hole in his sock. Alternative 2: He dropped a hammer on his foot.

The best way to encode this for RoBERTa is to create two TextFields, one containing the string "The man broke his toe. What was the CAUSE of this? He got a hole in his sock.", and the other one with "The man broke his toe. What was the CAUSE of this? He dropped a hammer on his foot.". Then put those two TextFields into a ListField, and put the ListField into the Instance object (together with some other stuff like the label for the correct answer and some metadata).

To be sure, there are other ways to encode this. I could not find how they did it with a quick search.

The Model will take these instances, and run them through a TextFieldEmbedder (likely PretrainedTransformerEmbedder). This is what TransformerQAModel does, and you can maybe steal some code from there. You get a bunch of vectors back, one for every token in the input. The rest is fairly normal PyTorch stuff, i.e., get the vectors for the classify tokens, run a dense network to get a single value per text sequence, softmax over the pairs in the instance, compute a loss. The only AllenNLP specific thing at this point is that your Model needs to return a dict from the forward() method, and the loss needs to be in the dict under the name "loss".

ekdnam commented 3 years ago

Thanks a lot for the detailed response!

Have gone through it, will ping you here in case I run into some issues while implementation

dirkgr commented 3 years ago

I just realized that COPA is a Multiple Choice task, and we already have a multiple-choice model at https://github.com/allenai/allennlp-models/blob/main/allennlp_models/mc/models/transformer_mc.py. So all you have to do is write a correct reader (following the examples at https://github.com/allenai/allennlp-models/tree/main/allennlp_models/mc/dataset_readers), and write a training config that uses the existing model. The existing MC configs are at https://github.com/allenai/allennlp-models/tree/main/training_config/mc. This should save a lot of time!

ekdnam commented 3 years ago

Thanks a lot for the tip! Will surely do that!

ekdnam commented 3 years ago

Hi @dirkgr! Is this issue going to be considered for GSoC?

dirkgr commented 3 years ago

dsmGaKWMeHXe9QuJtq_ys30PNfTGnMsRuHuo_MUzGCg

We did not get chosen as one of the organizations in Google Summer of Code 2021, so, no. But these issues are still a great way to get started in NLP, and useful for AllenNLP users! We'll think about other ways of engaging the wider AllenNLP community with these models, as they are still very relevant.

dirkgr commented 3 years ago

@ekdnam, are you still interested in this task? Otherwise I would like to make it clear that it is free for the taking for others.

ekdnam commented 3 years ago

We did not get chosen as one of the organizations in Google Summer of Code 2021, so, no. But these issues are still a great way to get started in NLP, and useful for AllenNLP users! We'll think about other ways of engaging the wider AllenNLP community with these models, as they are still very relevant.

Oh okay!

@ekdnam, are you still interested in this task? Otherwise I would like to make it clear that it is free for the taking for others.

Yes, I am still interested in this task. But, since the last few weeks, I have been busy, the study workload has increased, and thus I don't think I will be able to deliver on the PR in the near future. Once all the work gets over, I will surely work on it. But till then, I won't be able to do so. If anyone wants to give this task a try, no worries!

ghost commented 3 years ago

Hi, @dirkgr I would like to work on this issue if @ekdnam is not planning to take this up. Let me know if I can take this up!

dirkgr commented 3 years ago

Yes please, go for it! Let me know if you need any help!

ghost commented 3 years ago

Hi, @dirkgr I would like to know once I add reader and config file, how to go about the training and testing? As per your previous comment, we don't require adding a separate model. We have transformer-mc model which is trained for MCQs. Where to get the COPA dataset from for training and testing? Is it from https://people.ict.usc.edu/~gordon/downloads/COPA-resources.tgz? Could you please explain the overall procedure to be followed in brief?

dirkgr commented 3 years ago

All the datasets are available from https://super.gluebenchmark.com/tasks, so the reader should read the data that's downloadable there.

The other component you need is a training config. There are some of those configs at https://github.com/allenai/allennlp-models/tree/main/training_config/mc, so you can probably just adapt what's there.

Once you have those two things, try training it with the allennlp train command. For example, to train PIQA, you would run allennlp train training_config/mc/piqa.jsonnet -s path/to/output. This should work out of the box if you have cloned the allennlp-models repo. So in your case, once you have your own training config, it would probably be something like allennlp train training_config/mc/copa.jsonnet -s path/to/output. You might have to play with the hyper parameters a little bit to get good performance, but maybe you'll be lucky and it'll work right away.

ghost commented 3 years ago

All the datasets are available from https://super.gluebenchmark.com/tasks, so the reader should read the data that's downloadable there.

The other component you need is a training config. There are some of those configs at https://github.com/allenai/allennlp-models/tree/main/training_config/mc, so you can probably just adapt what's there.

Once you have those two things, try training it with the allennlp train command. For example, to train PIQA, you would run allennlp train training_config/mc/piqa.jsonnet -s path/to/output. This should work out of the box if you have cloned the allennlp-models repo. So in your case, once you have your own training config, it would probably be something like allennlp train training_config/mc/copa.jsonnet -s path/to/output. You might have to play with the hyper parameters a little bit to get good performance, but maybe you'll be lucky and it'll work right away.

Thanks, @dirkgr for a detailed explanation. I will try it out this weekend

ghost commented 3 years ago

Hi, @dirkgr I added the reader and config file. Tried training 2-3 times with different LRs but accuracy didn't go beyond 54%. In all the above cases, training stopped around 5th-6th epochs as the patience was reached Would you like to suggest some ways to increase accuracy?

dirkgr commented 3 years ago

54% is almost random. There is certainly something wrong.

The most likely problem is the input data. I recommend you check what kind of data actually arrives at the model's forward() method. If you find something wrong there, we can work from there.

I like to do this in a debugger, but many people don't like debuggers. You can also add this to your trainer config to print the first batch:

    "callbacks": [
      {
        "type": "console_logger",
        "should_log_inputs": true
      }
    ]
dirkgr commented 3 years ago

I updated the description of this task to recommend the new Tango framework.