-
When committing to elasticsearch (see the below config), the `collector-http.sh` script never terminates even though the crawler run has already ended. I have to manually kill the process using `CTRL+…
niels updated
7 years ago
-
Using the technique in #44, I discovered I didn't need to do anything to extract schema.org metadata, because either Norconex importer or Tika will create metadata for objects within an itemscope.
…
-
A selector for both content and title would be nice to have.
-
Please provide a sample setup to crawl a website and store the content in Solr repo. Also we have other requirements like, indexing Metadata, skip certain URLs, parsing only part of a content page and…
-
hi there
I am trying to figure it out how to use the Boilerpipe jar file, however I am not able to do it. could you please post some basic instructions or share with me an address ?
thanks a lot
-
Hi, can you help me? I try to run minimum example and get no errors but no data appear in the solr core.
```
:/opt/norconex-col$ ./collector-http.sh -a start -c examples/minimum/minimum-config.xml
…
-
When the header contains a period in a domain name or has 2 sentences (2 periods or 1 period and question mark) followed by newlines it is not used as title.
-
Hi, I am looking for a way to create a nested field with the following structure in my elastic search ingested documents:
```
color:{
type:"nested",
properties:{
level:{type:"integer"}…
-
Hi Pascal,
I have a site with multiple identical ```` tags on the same level. The content that I want to parse is in the first one. How can I do that?
Tried with combinations of StripAfterTrans…
sveba updated
7 years ago
-
I´d like to setup a crawler to feed my Solr instances.
This is my setup:
# Configuration
## Solr
https://github.com/Norconex/committer-solr/tree/master/norconex-committer-solr/src/test/java…