Closed angelo337 closed 7 years ago
All Importer handlers support the restrictTo
tag which allows you to make sure the handler is applied only on desired documents. For instance, if you want to make sure StripBeforeTransformer is only applied to text/html
, you can do it like this:
<transformer class="com.norconex.importer.handler.transformer.impl.StripBeforeTransformer">
<restrictTo field="document.contentType">text/html</restrictTo>
<stripBeforeRegex>.*your regex.*</stripBeforeRegex>
</transformer>
thanks a lot for your answer, i am going to try it.
Hi there i am trying to crawl a website with several file types and I have to strips before and after, and when I hit some file not application/HTML I am getting an error, is it possible to apply strips just to a single type of files? PLease, I already try to strip the crawler in HTML and other types and the other crawler just get stuck and no crawling at all Thanks a lot Angelo