-
I am trying to crawl web pages to store in AWS Cloud Search and facing problem in storing tags value in cloudsearch.
Below are details of problem:
I am able to see both title and h3 in Debug LOG.…
-
Hi,
I am trying to get the textpatterngagger (suggest something different if this is not a good idea) to get just the filename from the document path (document.reference). Here is my config:
``…
-
I am trying to crawl one of my website using Norconex collector-http and committer to submit documents to AWS Cloudsearch.
I have made good progress but facing some issues as described below:
1. I…
-
I am trying to set up a norconex connector for a site
and my issue is that the URLs under the div portion is not getting crawled.
Attaching the configuration code here:-
```xml
#set($http…
-
From @niels, in https://github.com/Norconex/collector-http/issues/200#issuecomment-168659138:
> A couple of weeks ago, I also modified the elasticsearch committer to use ES's REST interface – so I a…
-
When committing to elasticsearch (see the below config), the `collector-http.sh` script never terminates even though the crawler run has already ended. I have to manually kill the process using `CTRL+…
niels updated
7 years ago
-
I'm getting the following error when I run my committer code:
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/uvahea…
dkh7m updated
7 years ago
-
I'm working on a project based on norconex-collector in which I need to store also the list of urls extracted but not parsed.
For example, if I set and a page contains one or more links to another …
-
Hi, I am looking for a way to create a nested field with the following structure in my elastic search ingested documents:
```
color:{
type:"nested",
properties:{
level:{type:"integer"}…
-
Please provide a sample setup to crawl a website and store the content in Solr repo. Also we have other requirements like, indexing Metadata, skip certain URLs, parsing only part of a content page and…