emorynlp / nlp4j

NLP framework for JVM languages.
http://emorynlp.github.io/nlp4j/
Other
149 stars 33 forks source link

Nlp Training for NER #11

Closed brew42 closed 7 years ago

brew42 commented 8 years ago

Hi

I am trying to use NPLTrain in ner mode. I have been using the file attached but get the error below.

Command ./bin/nlptrain -c config-train-ner.xml -mode ner -t sample-trn.tsv -d sample-dev.tsv -m sample-dep.xz

Error java.lang.IllegalArgumentException: No enum constant edu.emory.mathcs.nlp.component.template.util.BILOU.2 at java.lang.Enum.valueOf(Enum.java:238)

And ideas?

It does generate an output. See sample-dep.xz attached. Is there anyway of previewing this to approve the content?

Also when i manage to generate the output I was wondering which file this would replace in my configuration file - does it replace this?

edu/emory/mathcs/nlp/lexica/en-named-entity-gazetteers-simplified.xz

Tom

Attachments config-train-ner.xml.zip sample-dep.xz.zip

jdchoi77 commented 8 years ago

So sorry for the late response. Could you send me the link of the configuration and the input dataset? I think there is a configuration issue. Thank.

best,

Jinho

brew42 commented 8 years ago

Thanks Jinho

Configuration, training & dev files zipped & attached.

Tom

sample-dev.tsv.zip

sample-trn.tsv.zip

config-train-ner.xml.zip

brew42 commented 8 years ago

Hi Jinho

Were you able to review my configuration?

Thanks Tom

jdchoi77 commented 8 years ago

I haven't found time to do (sorry). I'll have some time this Wed so I'll let you know. Thanks for being patient.

best,

Jinho

javierlores commented 7 years ago

Hey, so I ran into this same problem and from what I can tell, the problem for me was that the NER trainer expects every token in the training and development .tsv files to be labeled in BILOU notation. But if you look at sample-trn.tsv you can see that around line 63 it stops labeling tokens with the O to indicate they are outside an entity. I don't know that it was affecting the results of the NER, but adding the missing O's got rid of this error for me.

Note: I tested this on different files, but I'm guessing this might be the same problem.

Javier

brew42 commented 7 years ago

Thanks Javier but looks like we are looking into spacey now.

Regards Tom


From: Javier Lores notifications@github.com Sent: 20 September 2016 01:11 To: emorynlp/nlp4j Cc: brew42; Author Subject: Re: [emorynlp/nlp4j] Nlp Training for NER (#11)

Hey, so I ran into this same problem and from what I can tell, the problem for me was that the NER trainer expects every token in the training and development .tsv files to be labeled in BILOU notation. But if you look at sample-trn.tsv you can see that around line 63 it stops labeling tokens with the O to indicate they are outside an entity. I don't know that it was affecting the results of the NER, but adding the missing O's got rid of this error for me.

Note: I tested this on different files, but I'm guessing this might be the same problem.

Javier

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/emorynlp/nlp4j/issues/11#issuecomment-248165646, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAPb6PX1oIkj2sJB8sh6XjHH0_CC0WIuks5qrySlgaJpZM4Jdykl.

jdchoi77 commented 7 years ago

I guess the lack of developing time at the moment is hurting :( Sorry for not being so prompt.

best,

Jinho

shailesh-NITK commented 7 years ago

Hi, I am trying to reduce size of dependency model. For that i need training data set. Can I get training and development data set that you used for creating dependency model. My email Id: shaileshtayde10@gmail.com