citiususc / Linguakit

Multilingual toolkit for NLP: dependency parser, PoS tagger, NERC, multiword extractor, sentiment analysis, etc.
GNU General Public License v3.0
64 stars 22 forks source link

Co-Reference #1

Closed gamallo closed 8 years ago

gamallo commented 8 years ago

A new module for solving co-reference will be integrated by Marcos Garcia

gentakojima commented 8 years ago

Any update on this? 💪

gamallo commented 8 years ago

The prototype for co-reference identification has been implemented several months ago by Marcos Garcia. He committed himself to integrate the module in Linguakit, but he hasn't a github account yet

marcospln commented 8 years ago

Module uploaded!

gentakojima commented 8 years ago

I don't understand how the module works, probably because I don't understand exactly what it does. I find it confusing that there is the "coref" parameter to use the module, but there seems to be also a "-coref" parameter.

I'd do this myself, but I don't know if I'm missing something:

  1. Correct the README.md: Where it says "COREF (parameter -coref)" should be "COREF (parameter coref)"
  2. Parameter "-coref" (file linguakit - line 156) should not be taken into account as a valid parameter. Neither do "coref" in that very same line, because that module identifiers are taken into account previously.

Could you also add an usage example in the Examples part of the README.md?

gentakojima commented 8 years ago

In a side note, maybe the module lives in the tagger subdirectory for some affinity reason, but I find that confusing too.

marcospln commented 8 years ago

Thanks! I've just corrected the README and linguakit files. Also, I modified the en.txt test file in order to show how COREF works.

If you run coref on the test (./linguakit en coref test/en.txt) you will see that NPs contain an extra column with a numerical ID. Ideally, this ID should be the same in the NPs referring to the same discourse entity (Paul = Paul_Wilson (but not Mary_Wilson); Sandra = Sandra_Curtis, etc.).

The -crnec option (experimental) uses the information provided by this kind of clustering to (try to) correct wrong NEC labels.

marcospln commented 8 years ago

Yes, it was just a NEC option in the first commit. Then it has moved to a real parameter.

Actually, it could be seen just as a NEC extension, or as a completely new NLP module.

gentakojima commented 8 years ago

Thanks for the quick response, the explanation and examples. 😀 I'm closing this again, hope that's ok.