-
Hello, wondering if there is a way I can retain all the html formatting while indexing the body contents of a web page.
-
Hi Pascal.
For some reason I cannot get XML parsing to work (see below), and only some of my html tags are working.
What am I doing wrong?
```
``…
-
hello Pascal,
we have to crawl many web-servers and some of them provide incorrect encoding data, e.g.:
```
> GET / HTTP/1.1
> Host: www.zeichentrickserien.de
> User-Agent: HTTP-Collector
> Ac…
-
Hello,
The Norconex stack is by far the best crawler+importer technology around, much thanks for your excellent works!
Over the past couple days I have been developing a tool to generate a broke…
-
When merging constant field to field with multiple instances as below, the output is
```
https://grocery.walmart.com
/ip/Peeled-Baby-Cut-Carrots-2-lbs/10451316?athcpid=10451316&athpgid=s…
-
Input line is:
```
/ip/Organic-Carrots-2-lb-bag/44391103?athcpid=4
```
Expecting "/ip/" to be replaced.
Configuration is this:
```
.*
^\/ip\/.*
…
-
We need to trim down the title that a webpage has but I can't get it to work. The title has pipes ( | ) in it and we want to only keep the words to the left of the first pipe. I've tried the textbetwe…
-
Hi everyone,
I am a beginner of norconex http collector,
try to modify the HTML and execute run.bat,
and I am using solr to check the result.
It seem that I have to change the "crawler id" e…
-
My parsed content has a lot of CRLF I am trying to clean up. Should
```
\r
\r\n
\n
```
be working or is the \r\n not supported?
-
This configuration has to ideally return "Fresh Raspberries...", but I cannot get anything out.
```
I've also tried: ".productTile__details___3lfva a div" with extract=html (etc....)
```
W…