Closed david-waterworth closed 3 years ago
Thanks for the questions, interesting about what to do for domains where those no pre-trained MLM. I'll respond over the weekend.
Responses inline.
I'm wondering if you think it's worth modifying dygie to enable either pretrained embeddings with a passthrough encoder, or learned embeddings with an lstm encoder (or some other of the allennlp encoders). It's fairly easy to do, I adapted the crf_tagger (https://github.com/allenai/allennlp-models/blob/main/allennlp_models/tagging/models/crf_tagger.py).
I'm happy to submit a PR if you think this is useful. I simply added an encoder parameter to the model class and when it's None used a PassthroughEncoder of the same dimension as the embedded. Then I passed the output of the embedded through the encoder with the mask.
In my domain, there are no pre-trained models available. I have trained one using MLM but at this stage, I actually get better results training my own encoder.
I'm not sure how common this is - i.e. there's no domain-specific MLM available, and using a general MLM actually does worse than using pretrained embeddings. I'd say that if you're going to do it anyhow for your own research, and you're able to make the change in such a way that a user can just modify a line in the relevant training config (the configs are here), I'd be happy to accept a PR. Otherwise, might not be worth it.
Also interested in your thoughts on whether it could make sense to prune based on the scores from the NER task - it seems that you could sum the label probabilities to obtain an entity score? And is there any reason why the NER scorer "hard codes" the score for the null label to 0. It seems simpler to score all classes including the null label?
I think you're proposing to prune relation argument candidates based on the output of the NER model? I tried something like this. In practice it didn't make that much of a difference. As above, I'd accept a PR if you end up trying this.
Finally when you update the span embeddings via propagation, do you think the main value of this is when the coreferences are complex - for example the "car", "this thing", "it" from your paper? In the case of simple co-references (i.e. car repeated multiple times) I would assume that the embeddings themselves should be enough - since the embeddings will adjust with training?
I think so? We did a bit of analysis on this in the paper, but that was a few years ago now.
Thanks, it is easy to do. As you say it may not be overly useful to others though - I did briefly play DYGIE++ with a small subset of my corpus and an LSTM encoder - it kind of works but not very well. I suspect the issue is trying to train both the span propagation and the underlying representations at the same time is difficult. I've had more luck with a simpler model which doesn't do the propagation, and my current approach is to pre-train the encoder with a entity mention binary classifier.
Anyway once I progress further I can definitely contribute back the encoder change
I'm wondering if you think it's worth modifying dygie to enable either pretrained embeddings with a passthrough encoder, or learned embeddings with an lstm encoder (or some other of the allennlp encoders). It's fairly easy to do, I adapted the crf_tagger (https://github.com/allenai/allennlp-models/blob/main/allennlp_models/tagging/models/crf_tagger.py).
I'm happy to submit a PR if you think this is useful. I simply added an encoder parameter to the model class and when it's None used a PassthroughEncoder of the same dimension as the embedded. Then I passed the output of the embedded through the encoder with the mask.
In my domain, there are no pre-trained models available. I have trained one using MLM but at this stage, I actually get better results training my own encoder.
Also interested in your thoughts on whether it could make sense to prune based on the scores from the NER task - it seems that you could sum the label probabilities to obtain an entity score? And is there any reason why the NER scorer "hard codes" the score for the null label to 0. It seems simpler to score all classes including the null label?
Finally when you update the span embeddings via propagation, do you think the main value of this is when the coreferences are complex - for example the "car", "this thing", "it" from your paper? In the case of simple co-references (i.e. car repeated multiple times) I would assume that the embeddings themselves should be enough - since the embeddings will adjust with training?