norconex-importer Search Results

413 results
for norconex-importer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/importer #78

StripBetweenTransformer parsing too literally?

Hello! In reference to [#370](https://github.com/Norconex/collector-http/issues/370), I am trying to eliminate the MENU section of my HTML code, however, I am experiencing issues using the example pr…

kengher updated 6 years ago
2
Norconex/crawlers #489

id is too long, must be no longer than 512 bytes but was

Using Norconex HTTP Collector + Elasticsearch commiter. ``` [non-job]: 2018-05-22 14:58:15 INFO - Version: Norconex HTTP Collector 2.7.0-SNAPSHOT (Norconex Inc.) [non-job]: 2018-05-22 14:58:15 INFO…

aleha84 updated 6 years ago
3
Norconex/crawlers #476

http collectors doesn't crawl in dynamically generates websi…

Hi I would like to extract experts contact information from a site which dynamically generates list of available experts. I saved these dynamically created sites into webpages-list containing fo…

tdrobcsak updated 6 years ago
14
Norconex/crawlers #483

DOMContentFilter creates REJECTED_IMPORT (com.norconex.impo…

Hi I am trying to filter the HTML source removing all those DIVs that i don't need (for example disclaimers, modals ecc). I read the doc at https://www.norconex.com/collectors/importer/latest/apid…

mauromi76 updated 6 years ago
4
Norconex/crawlers #485

Configuration to extract only a certain type of files

I need to extract only a certain type of files from a repository, for example the .pdf, ppt, ... I am using this configuration but it does not work. ```xml #set($http = "com.norconex.collect…

javpdiaz updated 6 years ago
3
Norconex/crawlers #493

How to use TextPatternTagger to extract domain.subdomain int…

Hi! I've been struggling to use the TextPatternTagger to extract the domain+subdomain (x.y.z -> y.z). I have a field, uri, which essentially is equivalent to "document.reference". I would like to a…

kodo651 updated 6 years ago
8
Norconex/crawlers #458

Continuous crawling through a queue

I have a question regarding continuous crawling (or scheduling for that matter). I've read your post regarding the similar topics here: https://github.com/Norconex/collector-http/issues/93. But it doe…

wolverline updated 6 years ago
6
Norconex/importer #74

TitleGeneratorTagger error when field text is empty or field…

When using the TitleGeneratorTagger it gives a NPE, probably because the field is empty or doesn't exist. Strings shouldn't be initialized as null, but as an empty string or there should be null check…

jsteggink updated 6 years ago
2
Norconex/crawlers #467

com.norconex.importer.parser.DocumentParserException: Unable…

I'm using the Norconex HTTP collector (v2.8.0) and am having some issues with extracting contents from PDFs. Here's a gist of the error: https://gist.github.com/mbockenstedt/4f521a44f21221671c64e62…

bockensm updated 6 years ago
2
Norconex/crawlers #470

Crawling from some URLs is not possible

Crawling some urls with the following configuration (see below) works the crawler just fine. But with a few common urls it gives unexpectedly the error message (The real url name is intentionally chan…

evaso updated 6 years ago
2

上一页 1...15 16 17 18 19 20 21...42 下一页

413 results for norconex-importer

413 results
for norconex-importer