Closed neumannm closed 7 years ago
wrt properties file: Normally, we would have not have a configuration file - all parameters would be on the component itself. It might internally generate a file or pass the settings on directly to the underlying training code.
wrt training files: Normally, we would extract the data from the CAS and pass it on directly to the training code without writing them to a file first.
However, doing the above likely involves quite a bit of work given the way that the Stanford CRF is implemented. For that reason, you might prefer implementing your training component in such a way that the properties file is passed as a parameter and the training data is written out to a temporary file.
Thanks @reckart for your comments. Regarding the properties, it would not be much work including all parameters in the component - just many many lines of code to add because there are not less than 95 parameters recognized by the StanfordNER Trainer.
Regarding the training data, I will do as you suggested.
@neumannm wow, that's a lot ;) Maybe start with allowing to specify a properties file and later we could expose parameters that are commonly changed directly as parameters. It would be nice though if the component would assume some defaults (e.g. for English NER) if no properties file is specified at all.
Is there more to do on this issue at the moment?
I don't think so. Thanks for your fixes btw. I hope that some people will use this component and if there are problems with it I think they will submit new issues.
Ok, cool. Then I'll close this issue.
Analogous to the OpenNlpNamedEntityRecognizerTrainer, it would be nice to also have a component for training NER models for Stanford CoreNLP.
The important aspects are:
Btw here is the FAQ to Stanford NER training.