dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Update datasets API #1384

Closed reckart closed 5 years ago

reckart commented 5 years ago

Many of the URLs listed in the dataset descriptions now redirect elsewhere (usually http -> https) and need to be updated since the DatasetFactory cannot deal with redirects yet.

The text file for CC-BY 4.0 has had a whitespace-only change. It would be good to have the option to ignore whitespace when validating plain text files to be more resilient against such changes.