Closed gamallo closed 8 years ago
Any update on this? 💪
The prototype for co-reference identification has been implemented several months ago by Marcos Garcia. He committed himself to integrate the module in Linguakit, but he hasn't a github account yet
Module uploaded!
I don't understand how the module works, probably because I don't understand exactly what it does. I find it confusing that there is the "coref" parameter to use the module, but there seems to be also a "-coref" parameter.
I'd do this myself, but I don't know if I'm missing something:
Could you also add an usage example in the Examples part of the README.md?
In a side note, maybe the module lives in the tagger
subdirectory for some affinity reason, but I find that confusing too.
Thanks! I've just corrected the README and linguakit files. Also, I modified the en.txt test file in order to show how COREF works.
If you run coref on the test (./linguakit en coref test/en.txt) you will see that NPs contain an extra column with a numerical ID. Ideally, this ID should be the same in the NPs referring to the same discourse entity (Paul = Paul_Wilson (but not Mary_Wilson); Sandra = Sandra_Curtis, etc.).
The -crnec option (experimental) uses the information provided by this kind of clustering to (try to) correct wrong NEC labels.
Yes, it was just a NEC option in the first commit. Then it has moved to a real parameter.
Actually, it could be seen just as a NEC extension, or as a completely new NLP module.
Thanks for the quick response, the explanation and examples. 😀 I'm closing this again, hope that's ok.
A new module for solving co-reference will be integrated by Marcos Garcia