-
http://www.norconex.com/collectors/importer/configuration does a good job of laying out the possibilities, but also doesn't quite explain what is the implicit, default configuration for the default Ti…
-
Same as https://github.com/Norconex/collector-http/issues/326
But at this time for importer section. I believe that this is relevant both for preParseHandlers and postParseHandlers or for importer en…
-
So, the solr and elastic search committers share some common idioms, and in both cases, there is a default of using the crawled URL as the id for the document. I want to use a generated UUID for the…
-
Hi, I'd been using Norconex and I found it to be a very versatile crawler. I try to crawl a new site but I got NPE, and I found out the page have a \ tag that has href="" (empty string). I think this …
-
I'd like to test "minimum" and "complex" examples with solr but not sure what changes to make to minimum-config.xml and complex-config.xml. I'm trying, at the same time Solr, so my repository is colle…
-
Can I get this to apply to the content, and smush it all into a single-line?
Thanks
-
So, another tool, scrapy, offers a lot less out of the box - but it does offer a shell you can easily invoke on any URL and explore what selectors etc. may do to it.
A similar feature would be a Na…
-
Hallo,
I have tried to use the collector in combination with the AWS-Cloudsearch-Comitter.
And I have (still) 2 problems:
1) is there any chance to commit the crawled-results again to AWS in case…
-
I need some tagger tool which will give me ability to analize specified meta and write some data to another meta.
If more specific, then I need to create extra field which should contain info is d…
-
I need additional field in resulting object which contains url extension. For example ".Html", ".Php", ".aspx", etc.
How to do it without additional programming by using only configuration options…