-
Make sure Solr Committer works with Solr 5 and update Solr dependencies accordingly.
As described in issue: https://github.com/Norconex/collector-http/issues/57, it seems Solrj falls back to sendin…
-
While running HTTP Collector on a non-Norconex site, after collecting some thousand documents commited to Solr, I had to interrupt it. I interrupted the run closing the DOS box.
After that, every new …
-
Hi, I'm trying to gater information about links: the text near che anchor.
I'm using:
norconex-collector-http-2.0.2.zip with openjdk-7
I have this definition:
```
text/htm…
-
Almost all documents crawled by HTTP Collector have information about its language, but some PDF, DOC, etc may not have metadata because the authors don't register such type of information.
In this ca…
-
Hello!
Haven't visited you for a long time :)
I cleaned workdir and tried to launch the collector via command line getting such exceptions:
```
WARN [ConfigurationUtil] Could not instantiate objec…
-
I just checked documentation and can't find any link/text which may help me with refactoring of the existent committer. It is possible to use Importer also (I agree - a dirty hack), I guess, but the l…
-
While trying to reproduce the bug described in #69 I have noticed that the -a resume command does not seem to work properly.
Here are the steps to follow to reproduce the problem.
1) collector-ht…
-
I'm using TextBetweenTagger in order to acquire HTML code from crawled pages. The configuration looks like:
```
^.*
.*$
```
However, this has pu…
-
Hi,
I need to use collector-http to get data from several sites which fulfill some regular expression and store them in a database via Java application. Is this possible with collector-http, and how …
-
I get following error during test crawler execution, and, in spite of collected information it weren't commited. May you help me understand the reason and fix it?
```
[non-job]: 2014-10-14 14:19:06,6…