dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Support training Stanford NER model #1000

Closed neumannm closed 7 years ago

neumannm commented 7 years ago

Analogous to the OpenNlpNamedEntityRecognizerTrainer, it would be nice to also have a component for training NER models for Stanford CoreNLP.

The important aspects are:

Btw here is the FAQ to Stanford NER training.

reckart commented 7 years ago

wrt properties file: Normally, we would have not have a configuration file - all parameters would be on the component itself. It might internally generate a file or pass the settings on directly to the underlying training code.

wrt training files: Normally, we would extract the data from the CAS and pass it on directly to the training code without writing them to a file first.

However, doing the above likely involves quite a bit of work given the way that the Stanford CRF is implemented. For that reason, you might prefer implementing your training component in such a way that the properties file is passed as a parameter and the training data is written out to a temporary file.

neumannm commented 7 years ago

Thanks @reckart for your comments. Regarding the properties, it would not be much work including all parameters in the component - just many many lines of code to add because there are not less than 95 parameters recognized by the StanfordNER Trainer.

Regarding the training data, I will do as you suggested.

reckart commented 7 years ago

@neumannm wow, that's a lot ;) Maybe start with allowing to specify a properties file and later we could expose parameters that are commonly changed directly as parameters. It would be nice though if the component would assume some defaults (e.g. for English NER) if no properties file is specified at all.

reckart commented 7 years ago

Is there more to do on this issue at the moment?

neumannm commented 7 years ago

I don't think so. Thanks for your fixes btw. I hope that some people will use this component and if there are problems with it I think they will submit new issues.

reckart commented 7 years ago

Ok, cool. Then I'll close this issue.