dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
195 stars 67 forks source link

Upgrade to UDPipe 1.1.0 #1039

Closed reckart closed 6 years ago

reckart commented 7 years ago

https://github.com/ufal/udpipe/releases/tag/v1.1.0

Additional tasks deferred to #1141.

reckart commented 7 years ago

@kouylekov-usit is this something you would like to do?

kouylekov-usit commented 7 years ago

Yes we are. Should I prepare the new build script ? I have created a version with the segmenter wrapper to handle the offsets (will make a pull request for this one). But will be interesting to see what they mean by:

Cheers Milen

On 03/31/2017 03:41 PM, Richard Eckart de Castilho wrote:

@kouylekov-usit https://github.com/kouylekov-usit is this something you would like to do?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dkpro/dkpro-core/issues/1039#issuecomment-290714938, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGHYI1YMryJwLurgyRNYUqmXqHrxocSks5rrQJ8gaJpZM4Mvrka.

reckart commented 7 years ago

@kouylekov-usit - yes, the build.xml needs updating - not sure if the models changed, but the binary definitely. No idea what the format change means.

kouylekov-usit commented 7 years ago

I am on it

On Apr 3, 2017 11:31 AM, Richard Eckart de Castilho notifications@github.com wrote:

@kouylekov-usithttps://github.com/kouylekov-usit - yes, the build.xml needs updating - not sure if the models changed, but the binary definitely. No idea what the format change means.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dkpro/dkpro-core/issues/1039#issuecomment-291093243, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQGHYAL6rMBlg5I70Vm_4LRM9sAB3S0zks5rsLxPgaJpZM4Mvrka.

oepen commented 7 years ago

i have been in touch with milan stranka (the UDPipe developer) on a different project; in that thread, he wrote

UDPipe 1.1 is able to store token offsets in the CoNLL-U [...]

so possibly the ‘magic’ to recover character offsets from comparing tokens to the underlying document may no longer be required when moving to the new version?

kouylekov-usit commented 7 years ago

I will verify. We have a working/tested version of the wrapper if it does not.

Cheers Milen

On Apr 3, 2017 11:39 AM, Stephan Oepen notifications@github.com wrote:

i have been in touch with milan stranka (the UDPipe developer) on a different project; in that thread, he wrote

UDPipe 1.1 is able to store token offsets in the CoNLL-U [...]

so possibly the ‘magic’ to recover character offsets from comparing tokens to the underlying document may no longer be required when moving to the new version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dkpro/dkpro-core/issues/1039#issuecomment-291095118, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQGHYEIdCRSvBwjJ-kq1yXFWjv11uVIoks5rsL5EgaJpZM4Mvrka.

reckart commented 7 years ago

@kouylekov-usit I have uploaded a new udpipe 1.1.0 JAR file to the UKP repo and also deployed the new binary JAR there. The POM has been updated accordingly.

reckart commented 7 years ago

@kouylekov-usit You should check again in the new 1.1.0 Java API that is now available to the module if you find the methods for accessing the offset information.

reckart commented 7 years ago

@kouylekov-usit any news regarding the offsets or any further plans atm?