-
There are a lot of this kind exceptions in log file.
```
com.norconex.importer.parser.DocumentParserException: org.apache.tika.exception.TikaException:
TIKA-198: Illegal IOException from org.ap…
-
The current way of using `ScriptTagger` is like this:
```xml
```
Some of my scripts are longer than a few lines, so I thought it would be nice to have them in separate files. I tried …
-
Dear Mr. Paul,
I am geting the following error while committing to MYSQL 5.7 version
> Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\x89 I...' for column 'content' at row…
-
This project has been very helpful, but I've got a roadblock that I can't seem to get around. I've been able to configure the crawler to authenticate against a site and then begin to crawl. However,…
-
Hi Pascal,
Ref : Norconex/importer: Issue No Import only certain text from HTML file #87 (https://github.com/Norconex/importer/issues/87 )
Based on your advice on using PhantomJS for fetching dy…
-
Hi,
I am splitting an HTML document using DOMSplitter with img selector to extract what is in tags. After that I am trying to get some attributes like "alt:" and "src:" from "content" field (where…
-
Why this crawler configuration always return a "handshake_failure" alert and a java.net SSLHandshakeException ?
```
https://sapp2.formalazio.it/sapp/login
Mozilla/5.0 (Windows NT 6.…
-
Dear Sirs,
I want to configure my in order to crawl starturls every 30 minutes. I tried using both the tags and the tag, but when the crawler job ends, the connector terminates. I would expect i…
-
Hello,
I am using the Norconex collector 2.8.0 to crawl my web sites. It is a great product and thank you for making it available open source.
I want to have just one case-insensitive entry…
-
Hi,
When parsing pdf documents which contain hyperlinks, the links end up in the extracted content.
I'm using http_collector 2.8.1 and a simple pdf document (created from word) which has the wor…
ghost updated
5 years ago