norconex-importer Search Results

413 results
for norconex-importer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/crawlers #702

Question - Retain html tags while indexing

Hello, wondering if there is a way I can retain all the html formatting while indexing the body contents of a web page.

4rsood updated 4 years ago
6
Norconex/crawlers #683

Questions about DOMTagger - XPath & html

Hi Pascal. For some reason I cannot get XML parsing to work (see below), and only some of my html tags are working. What am I doing wrong? ``` ``…

svanschalkwyk updated 4 years ago
6
Norconex/importer #110

Best way to handle incorrect encoding information

hello Pascal, we have to crawl many web-servers and some of them provide incorrect encoding data, e.g.: ``` > GET / HTTP/1.1 > Host: www.zeichentrickserien.de > User-Agent: HTTP-Collector > Ac…

jetnet updated 4 years ago
11
Norconex/crawlers #428

Links not being extracted from PDF documents

Hello, The Norconex stack is by far the best crawler+importer technology around, much thanks for your excellent works! Over the past couple days I have been developing a tool to generate a broke…

douglas-andrew-harley updated 4 years ago
3
Norconex/crawlers #685

Merging constant field to field with multiple instances

When merging constant field to field with multiple instances as below, the output is ``` https://grocery.walmart.com /ip/Peeled-Baby-Cut-Carrots-2-lbs/10451316?athcpid=10451316&athpgid=s…

svanschalkwyk updated 4 years ago
2
Norconex/crawlers #686

ReplaceTransformer matching issue?

Input line is: ``` /ip/Organic-Carrots-2-lb-bag/44391103?athcpid=4 ``` Expecting "/ip/" to be replaced. Configuration is this: ``` .* ^\/ip\/.* …

svanschalkwyk updated 4 years ago
1
Norconex/importer #109

Strip Characters in title

We need to trim down the title that a webpage has but I can't get it to work. The title has pipes ( | ) in it and we want to only keep the words to the left of the first pipe. I've tried the textbetwe…

bkisselbach updated 4 years ago
3
Norconex/crawlers #704

Get/ Set current dateTime in config.xml

Hi everyone, I am a beginner of norconex http collector, try to modify the HTML and execute run.bat, and I am using solr to check the result. It seem that I have to change the "crawler id" e…

stds1a28 updated 4 years ago
9
Norconex/importer #33

ReduceConsecutivesTransformer behavior

My parsed content has a lot of CRLF I am trying to clean up. Should ``` \r \r\n \n ``` be working or is the \r\n not supported?

OkkeKlein updated 4 years ago
3
Norconex/crawlers #684

DOMTagger not selecting CSS as per Chrome Dev Tools

This configuration has to ideally return "Fresh Raspberries...", but I cannot get anything out. ``` I've also tried: ".productTile__details___3lfva a div" with extract=html (etc....) ``` W…

svanschalkwyk updated 4 years ago
5

上一页 1...6 7 8 9 10 11 12...42 下一页

413 results for norconex-importer

413 results
for norconex-importer