dwadden / dygiepp

Span-based system for named entity, relation, and event extraction.
MIT License
571 stars 120 forks source link

Using lstm encoder in place of pretrained_transformer #70

Closed david-waterworth closed 3 years ago

david-waterworth commented 3 years ago

I'm wondering if you think it's worth modifying dygie to enable either pretrained embeddings with a passthrough encoder, or learned embeddings with an lstm encoder (or some other of the allennlp encoders). It's fairly easy to do, I adapted the crf_tagger (https://github.com/allenai/allennlp-models/blob/main/allennlp_models/tagging/models/crf_tagger.py).

I'm happy to submit a PR if you think this is useful. I simply added an encoder parameter to the model class and when it's None used a PassthroughEncoder of the same dimension as the embedded. Then I passed the output of the embedded through the encoder with the mask.

In my domain, there are no pre-trained models available. I have trained one using MLM but at this stage, I actually get better results training my own encoder.

Also interested in your thoughts on whether it could make sense to prune based on the scores from the NER task - it seems that you could sum the label probabilities to obtain an entity score? And is there any reason why the NER scorer "hard codes" the score for the null label to 0. It seems simpler to score all classes including the null label?

Finally when you update the span embeddings via propagation, do you think the main value of this is when the coreferences are complex - for example the "car", "this thing", "it" from your paper? In the case of simple co-references (i.e. car repeated multiple times) I would assume that the embeddings themselves should be enough - since the embeddings will adjust with training?

dwadden commented 3 years ago

Thanks for the questions, interesting about what to do for domains where those no pre-trained MLM. I'll respond over the weekend.

dwadden commented 3 years ago

Responses inline.

I'm wondering if you think it's worth modifying dygie to enable either pretrained embeddings with a passthrough encoder, or learned embeddings with an lstm encoder (or some other of the allennlp encoders). It's fairly easy to do, I adapted the crf_tagger (https://github.com/allenai/allennlp-models/blob/main/allennlp_models/tagging/models/crf_tagger.py).

I'm happy to submit a PR if you think this is useful. I simply added an encoder parameter to the model class and when it's None used a PassthroughEncoder of the same dimension as the embedded. Then I passed the output of the embedded through the encoder with the mask.

In my domain, there are no pre-trained models available. I have trained one using MLM but at this stage, I actually get better results training my own encoder.

I'm not sure how common this is - i.e. there's no domain-specific MLM available, and using a general MLM actually does worse than using pretrained embeddings. I'd say that if you're going to do it anyhow for your own research, and you're able to make the change in such a way that a user can just modify a line in the relevant training config (the configs are here), I'd be happy to accept a PR. Otherwise, might not be worth it.

Also interested in your thoughts on whether it could make sense to prune based on the scores from the NER task - it seems that you could sum the label probabilities to obtain an entity score? And is there any reason why the NER scorer "hard codes" the score for the null label to 0. It seems simpler to score all classes including the null label?

I think you're proposing to prune relation argument candidates based on the output of the NER model? I tried something like this. In practice it didn't make that much of a difference. As above, I'd accept a PR if you end up trying this.

Finally when you update the span embeddings via propagation, do you think the main value of this is when the coreferences are complex - for example the "car", "this thing", "it" from your paper? In the case of simple co-references (i.e. car repeated multiple times) I would assume that the embeddings themselves should be enough - since the embeddings will adjust with training?

I think so? We did a bit of analysis on this in the paper, but that was a few years ago now.

david-waterworth commented 3 years ago

Thanks, it is easy to do. As you say it may not be overly useful to others though - I did briefly play DYGIE++ with a small subset of my corpus and an LSTM encoder - it kind of works but not very well. I suspect the issue is trying to train both the span propagation and the underlying representations at the same time is difficult. I've had more luck with a simpler model which doesn't do the propagation, and my current approach is to pre-train the encoder with a entity mention binary classifier.

Anyway once I progress further I can definitely contribute back the encoder change

dwadden commented 3 years ago

Sounds good, best of luck!

I think that the span propagation stuff is actually turned off by default; see this config for an example of how to turn it on.

I'll close this for now. Feel free to reopen if more stuff comes up later.