-
I get the below Apache Solr log error:
ERROR - 2015-09-26 22:18:51.914; [c:gettingstarted s:shard2 r:core_node2 x:gettingstarted_shard2_replica1] org.apache.solr.common.SolrException; org.apache.solr.…
-
I have a following task:
1. filter the collected page from HTML makeup, including menus, etc. Only the content; Seems default importer has such logics, but I need some advanced, including spaced/linee…
-
-
I''m trying to build norconex-collector-http stable release 2.2.1 with a simple "mvn clean install -DskipTests".
The build fails because it times out while downloading
```
org.apache.pdfbox:pdfbox:…
-
First let me thank you for this wonderful piece of software!
I am using 2.3.0-SNAPSHOT and would like to avoid duplicate pages like http://example.com and http://example.com/.
So I tried to configur…
-
Hi,
When Norconex finds a relative anchor url such as the following snippet in http://www.mpfr.org/:
`download`
it saves the source as:
`mpfr-current/#download _download`
Is there a way to config…
-
I use example(http://www.norconex.com/how-to-crawl-facebook/) to crawl facebook, however, i get this error, i use the norconex-collector-http 2.0.2 and the start url is "https://graph.facebook.com/v2…
-
I have modified the minimum example to crawl against my website (see below command) and one of the the fields that it displays is called "collector.referenced-urls" which contains many links which I a…
-
I have a field called "DC-ED.audience" that contains multiple strings that are separated by commas (see below example):
_"DC-ED.audience":["Institutions of Higher Education", "Administrators; Counselo…
-
_Request created from @comschmid comment in issue #163._
Allows to change the character case of field names, like `CharacterCaseTagger` does for field values.