ICIJ / node-tika

Apache Tika bridge for Node.js. Text and metadata extraction, language detection and more.
MIT License
138 stars 36 forks source link

Question: Is there a way to get text without placeholders? #7

Closed sumitchawla closed 9 years ago

sumitchawla commented 9 years ago

I am trying to parse a html document for getting it indexed into elasticsearch. The text comes back with placeholders like [image:] etc. Is there a n option to get text back without these placeholders?

mattcg commented 9 years ago

There's no runtime option to do this YET. The placeholder holder is controlled by the RichTextContentHandler.