-
How do I open mapdb files?
-
I'm using the latest Norconex Http collector. By default the importer removes Html elements and just keeps the body text.
How do I configure it to keep specific Html elements. For example,I would lik…
-
Exception in thread "pool-1-thread-1" java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocumentInformation.getDictionary()Lorg/apache/pdfbox/cos/COSDictionary;
at org.apache.tika.parser.pd…
-
In most cases there will not be a need for a startURL when using site maps.
When having the following site map configuration
``` xml
http://someurl:30025/sitemap.xml
``…
-
I am having trouble with the Elasticsearch committer. The crawler works fine but when it tries to send to Elasticsearch it get an "java.lang.NoSuchFieldError: LUCENE_3_6". I've tried looking around fo…
-
assets: 2015-04-29 09:01:09 FATAL - assets: An error occured that could compromise the stability of the crawler. Stopping excution to avoid further issues...
com.norconex.jef4.JEFException: Cannot per…
-
The example at http://www.norconex.com/collectors/importer/latest/apidocs/com/norconex/importer/handler/tagger/impl/ForceSingleValueTagger.html should show
```
instead of
```
-
There are sites (probably Drupal-based) which contain such links on pages:
```
A Link
```
Please inform me does collector-http process such links correctly. For now the log looks like it doesn't fin…
-
Sometimes Content-Length information is not available in the header. Maybe the parser can provide the info?
-
Related to issue #69, this is a small sample of the problem I experience concerning ``. That is, when a crawler is runnning, sometimes I see documents in `` registered some time ago, that still are th…