dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Added support in CONLL-U reader for document and paragraph IDs #1366

Closed manuelciosici closed 5 years ago

ukp-svc-jenkins commented 5 years ago

Can one of the admins verify this patch?

reckart commented 5 years ago

Jenkins, can you test this please?

manuelciosici commented 5 years ago

Hi @reckart ,

Thanks for the comments. You are right about document IDs:

Note that while document boundaries always occur between sentences, paragraph boundaries may under certain circumstances occur in the middle of a sentence (bulleted list items, verse etc.)

I will change the reader some time in the next few days and update the pull request.

manuelciosici commented 5 years ago

@reckart I added a warn and a unit test for documents with multiple IDs and I made the small code changes you suggested.

reckart commented 5 years ago

Jenkins, can you test this please?

reckart commented 5 years ago

Jenkins, can you test this please?

reckart commented 5 years ago

Jenkins, can you test this please?

reckart commented 5 years ago

Jenkins, can you test this please?

ukp-svc-jenkins commented 5 years ago

72% (-0.41%) vs master 73%