dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Tika 100000 characters Limit #1286

Closed andrewdaawin closed 6 years ago

andrewdaawin commented 6 years ago

below is the error I got while using TikaReader (dkpro vs 1.10.0):

Caused by: org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).

Is it possible to set the character limit from the params.

reckart commented 6 years ago

I'm adding a parameter and disable the default limit: https://github.com/dkpro/dkpro-core/pull/1287