Closed Tsyklop closed 4 years ago
Under the Importer section of your config, you can define content types you do not want to have parsed:
<documentParserFactory class="com.norconex.importer.parser.GenericDocumentParserFactory">
<ignoredContentTypes>
.*text/html.*
</ignoredContentTypes>
</documentParserFactory>
I need to get full page content (with html tags) in my commiter. How I can do this?
For now i geting just text, without html tags and other information
Maybe exists some class which provide that I need