ICIJ / node-tika

Apache Tika bridge for Node.js. Text and metadata extraction, language detection and more.
MIT License
138 stars 36 forks source link

HTML Extraction #15

Open SciutoAlex opened 8 years ago

SciutoAlex commented 8 years ago

Hi there-

I'd like to use the BoilerPipeContentHandler to only extract body text from an HTML page. Can anyone suggest a way to make this happen. I don't know much Java so I'm not sure where to even start.

http://stackoverflow.com/questions/23653061/how-to-extract-main-text-from-html-using-tika

Thanks! Alex