-
Hi,
I'm trying to use the new feature to assign a new value in case no match is found, But I can seems to get it to work.
I've got a general tagger that extract the part of the page with members of …
-
I'm trying to figure out a way to route a crawled document to a specific Elasticsearch index based on the language of the document. I am using the `com.norconex.importer.handler.tagger.impl.LanguageTa…
-
Hi,
For a given crawler, I extract/tag a field EXP_NAME+COUNTRY that contains both the name and the country of an author (in the format "firstname other-names lastname [CountryCode]").
Thanks to a R…
-
with the following configuration (crawling depth 0):
```
[...]
```
I get a StackOverflowError :
_java.lang.StackOverflowError
at java.io.UnixFileSystem.getB…
-
I've downloaded the software onto an Ubuntu 14.04 system with this java:
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (…
-
hi there
I am having an issue with a transformation I am trying to put in place
after I capture some information from the content field like this:
``` xml
((([U|u](NION|nion)\s[T|t](EMPORAL|empor…
-
I am trying to use Norconex HTTP Collector to configure its importer module to strip what's between headers, rightnavs and footers but is does not seem to be stripping what is between these known tag…
-
Hi,
I'm currently developing my own committer. In order to debug my own code, I wanted to keep the FileSystemCommitter , so that I can compare the output of both committer.
The configuration file lo…
-
DEBUG [CachedInputStream] Deleted cache file: /tmp/CachedInputStream-4437588446019026996-temp
Exception in thread "pool-1-thread-1" INFO [AbstractCrawler] My Crawler Name: Deleting orphan references …
-
I used to insert URL without writing the home page, as an example:
`
https://en.wikipedia.org/
`
instead doing
`
https://en.wikipedia.org/wiki/Main_Page
`
in first case it doesn't crawl, but in the…