Closed kengher closed 6 years ago
I can see from your config you have defined your StripBetweenTransformer as a postParseHandlers
. That is likely your problem. Once your HTML gets parsed by the importer, you will only have plain text (markup is gone). Move it under preParseHandlers
instead. Just be careful not to run this transformer on other files (especially binaries, like PDFs). Consider using the restrictTo
within the transformer tag to only apply it to HTML.
Thanks for the quick reply! Yes, I had completely missed the preParse handler. It is working as intended now after relocating the transformer, and thanks for the restrictTo tip. This issue can be closed.
Hello! In reference to #370, I am trying to eliminate the MENU section of my HTML code, however, I am experiencing issues using the example provided in the documentation:
My HTML:
The data inbetween
<!-- SIDENAV_START --> and <!-- SIDENAV_END -->
is still passing through. Now, when I type in this literal text into my HTML......It works. "Home" and "About" has now been ignored, except now in my HTML
<![CDATA[
is rendered as text in the browser before and after the menu.It is as if all HTML tags have been omitted before the Importer gets the chance to regex the data.
Does the Importer/Transformer have be set in a strict order before something else happens? Does
<![CDATA[<!-- SIDENAV_START -->]]>
not work in the latest version?I am using HTTP Collector v2.8.0 + Elasticsearch Comitter v4.1.0. Here is my config: