Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 68 forks source link

crawlerDefaults.importer.preParseHandlers ignored #333

Closed aleha84 closed 7 years ago

aleha84 commented 7 years ago

Same as https://github.com/Norconex/collector-http/issues/326 But at this time for importer section. I believe that this is relevant both for preParseHandlers and postParseHandlers or for importer entirely.

Putting importer section to crawler directly solwing the problem.

essiembre commented 7 years ago

Can you share your config? This may be working as expected. When you overwrite a config block for a crawler, the whole block (top-level tags in the crawler section) is overwritten. If you want more flexibility in having reusable portions of configuration, I recommend you create configuration fragments that you can dynamically include wherever you need in your config, using the #parse or #include directives. Just like the "complex-config.xml" example demonstrates in the HTTP Collector zip.

essiembre commented 7 years ago

Closing for lack of feedback. Please re-open if needed.