-
Hello! In reference to [#370](https://github.com/Norconex/collector-http/issues/370), I am trying to eliminate the MENU section of my HTML code, however, I am experiencing issues using the example pr…
-
Using Norconex HTTP Collector + Elasticsearch commiter.
```
[non-job]: 2018-05-22 14:58:15 INFO - Version: Norconex HTTP Collector 2.7.0-SNAPSHOT (Norconex Inc.)
[non-job]: 2018-05-22 14:58:15 INFO…
-
Hi
I would like to extract experts contact information from a site which dynamically generates list of available experts.
I saved these dynamically created sites into webpages-list containing fo…
-
Hi
I am trying to filter the HTML source removing all those DIVs that i don't need (for example disclaimers, modals ecc).
I read the doc at https://www.norconex.com/collectors/importer/latest/apid…
-
I need to extract only a certain type of files from a repository, for example the .pdf, ppt, ... I am using this configuration but it does not work.
```xml
#set($http = "com.norconex.collect…
-
Hi!
I've been struggling to use the TextPatternTagger to extract the domain+subdomain (x.y.z -> y.z). I have a field, uri, which essentially is equivalent to "document.reference". I would like to a…
-
I have a question regarding continuous crawling (or scheduling for that matter). I've read your post regarding the similar topics here: https://github.com/Norconex/collector-http/issues/93. But it doe…
-
When using the TitleGeneratorTagger it gives a NPE, probably because the field is empty or doesn't exist. Strings shouldn't be initialized as null, but as an empty string or there should be null check…
-
I'm using the Norconex HTTP collector (v2.8.0) and am having some issues with extracting contents from PDFs.
Here's a gist of the error: https://gist.github.com/mbockenstedt/4f521a44f21221671c64e62…
-
Crawling some urls with the following configuration (see below) works the crawler just fine. But with a few common urls it gives unexpectedly the error message (The real url name is intentionally chan…
evaso updated
6 years ago