dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
195 stars 67 forks source link

German OpenNLP chunker model #1454

Closed andreahorbach closed 1 year ago

andreahorbach commented 4 years ago

We trained a German chunker model for OpenNLP that we would like to contribute. The model was trained on the TIGER corpus using their annotations with a number of systematic modification to match the annotations to TreeTagger chunks. In a 10-fold cross-validation experiment as provided by the OpenNLP model trainer we reach an average F-Score of 96%. If such a model would be of interest for dkpro, could you guide us how to proceed in providing the model?

reckart commented 4 years ago

The way is normally works is that the person/group creating the model puts it up somewhere on the web (their website, GitHub, some language resource repository, etc.). Then we have a set of scripts in DKPro Core which download such models, package them up as a JAR, and then deploy them to our Maven repository.

So the first step would be that you put the model up somewhere. Please be confident that you are legally allowed to share the model.

Then, you or we could extend the OpenNLP model packaging script to include your model:

https://github.com/dkpro/dkpro-core/blob/master/dkpro-core-opennlp-asl/src/scripts/build.xml

Then we'd use the script to build the model JAR and to upload it to our Maven repo.

Finally, the DKPro Core OpenNLP Maven module pom.xml file would be extended to include the new model.

mariebexte commented 4 years ago

Thank you for explaining the process.

We uploaded the model at https://github.com/ltl-ude/opennlpChunkerGerman/blob/master/de-chunker-opennlp.bin

It would be great if you could take care of including it into the build.xml, but if not we can of course have a look at doing that ourselves.

reckart commented 4 years ago

@aggarwalpiush would you like to try this?

There is some documentation on implementing/extending the model building scripts here.

The model script for OpenNLP models is here.

You couldn't deploy it to the UKP Maven server at the moment though. Either I'd need to do that or we need to give you proper permissions.

aggarwalpiush commented 4 years ago

@reckart Before extending the model script, I was trying to run existing OpenNLP asl build.xml, but I got build issue at line 675

I found that ixa pos model is not available at the url provided in the script. I think, we also need to fix this issue.

reckart commented 4 years ago

@aggarwalpiush good idea :)

aggarwalpiush commented 4 years ago

the build.xml is fixed and PR-1459 is raised. If it looks good, kindly merge it and build the new models JARs at UKP Maven server.

As JARs are available at the server, I'll add them to OpenNLP pom.