Request: Training - Githubissues

Salakar commented 8 years ago

Hello!

Have you done any work on training, adding entities and such?

I can help, just need the base structure of it there as my cpp is a little poor hah.

Can do the stemmers and such. Was also contemplating on how the training instances would look, was thinking it'd be possible to do entity tagging within the string, something like:

const instance = 'This library by {bhelx}=PERSON is cool';

Or some other format of syntax sugar that'll find the instances in the string and add them as entities automatically, rather than manually providing the stemmed entity word positions one by one. Though having options for both would be good too.

Thoughts?

bhelx commented 8 years ago

Hey @Salakar

I haven't investigated doing custom training but it was in my TODOs. I assume we'd want to follow the way the underlying C library does it. I think it makes sense to just pass in the token locations the way the C API does it. Here is a python example:

https://github.com/mit-nlp/MITIE/blob/master/examples/python/train_ner.py

Copying that python API would probably be the most straightforward way. We probably wouldn't want to alter the source text because the trainer needs to know the token locations but I'd need to know more about how the parser would work.

RahulPol commented 7 years ago

Is this done?

bhelx commented 7 years ago

@RahulPol I'm not working on it. I'm not sure about @Salakar.

bhelx / mitie

Request: Training #2