-
I'm using TextBetweenTagger in order to acquire HTML code from crawled pages. The configuration looks like:
```
^.*
.*$
```
However, this has pu…
-
Hello,
I plan to create a copy of existent importer which contains some additional specific functional.
It is possible I misunderstood all importer's configuration options and creating a duplicate: …
-
I just checked documentation and can't find any link/text which may help me with refactoring of the existent committer. It is possible to use Importer also (I agree - a dirty hack), I guess, but the l…
-
Hello!
Haven't visited you for a long time :)
I cleaned workdir and tried to launch the collector via command line getting such exceptions:
```
WARN [ConfigurationUtil] Could not instantiate objec…
-
I have following config for a crawler:
.xml
```
#set($http = "com.norconex.collector.http")
#set($core = "com.norconex.collector.core")
#set($urlNormalizer = "${http}.url.impl.GenericURL…
-
In collector http configuration file I have the sentence:
text
In Solr, "text" field is defined as: indexed="true" stored="false"
On the other hand I'd need to use Solr "content" field (indexe…
-
I moved my configuration over from 1.34 to 2.0 and I receive the following error:
ERROR com.norconex.importer.Importer - Unsupported Import Handler: null
Is there additional configuration that I nee…
-
Hi,
I need to use collector-http to get data from several sites which fulfill some regular expression and store them in a database via Java application. Is this possible with collector-http, and how …
-
Is it possible to retrieve the anchor text and metadata of all crawled links pointing to one crawled document?
The problem I'm facing is setting a readable name on crawled document and the only human…
-
Hello!
I write my own committer implementation to put collected pages into MySQL database.
As an example I've taken SolrCommiter - is it a right decision?
So I inherited from AbstractMappedCommitt…