norconex-importer Search Results

414 results
for norconex-importer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/committer-solr #5

Moved to Collector-HTTP: Unable to find valid certification …

This error happens with the seed URL for the site, so no document in the site is processed. What can I do? ``` MC(crawler): 2015-05-05 18:57:27 ERROR - Cannot fetch sitemap: http://valitsus.ee/sitema…

csaezl updated 9 years ago
1
Norconex/crawlers #71

Java Null Pointer Exception when a link points to a page tha…

Here is an example of the error message the collector generates when it encounters a link that points to a page that no longer exist. ERROR [AbstractCrawler] Norconex Minimum Test Page: Could not pr…

martinfou updated 9 years ago
2
Norconex/crawlers #69

Restarting HTTP Collector

Loading documents into Solr with HTTP Collector, due to an issue, the computer restarted. Just to be sure, what is the offical advice to continue the process where HTTP Collector was interrupted?. Aft…

csaezl updated 9 years ago
54
Norconex/importer #1

An Error Parsing MP4 Files

It seems like a library is missing for MP4 parsing: Exception in thread "pool-1-thread-1" INFO [FilesystemCrawler] Projects: Re-processing orphan Files (if any)... java.lang.NoClassDefFoundError: org…

kalhomoud updated 9 years ago
5
Norconex/crawlers #135

One URL COMMITED several times in a crawler run

After running a crawler with `3` and just one URL, I have analysed the log and noticed that several URL are processed several times via the events: `DOCUMENT_FETCHED, CREATED_ROBOTS_META, URLS_EXTRAC…

csaezl updated 9 years ago
13
Norconex/crawlers #89

Restricting ReplaceTransformer

The regex in `.*test.*` is never passed to the importerhandler. Only field value.

OkkeKlein updated 9 years ago
1
Norconex/crawlers #55

Text from PDF, DOC, etc files

Since it is not unusual that such types of files don't have title, author, subject, etc., I'm wondering if there is a way of capturing about (say) 100 characters or so from the beginning of the docume…

csaezl updated 9 years ago
15
Norconex/crawlers #74

Identifying document language by content (LangDetectLanguage…

Almost all documents crawled by HTTP Collector have information about its language, but some PDF, DOC, etc may not have metadata because the authors don't register such type of information. In this ca…

csaezl updated 9 years ago
4
Norconex/crawlers #56

collector.referrer-link-text field not filled

Hi, I'm trying to gater information about links: the text near che anchor. I'm using: norconex-collector-http-2.0.2.zip with openjdk-7 I have this definition: ``` text/htm…

MirtoBusico updated 9 years ago
11
Norconex/crawlers #66

HttpImporterPipeline fails to run stage HttpMetadataChecksum…

I have a strange behaviour where pages are added for indexing if it's new and deleted if it has been crawled before. The expected behaviour should be to skip indexing if page is unmodied or index if …

leonardsaers updated 9 years ago
9

上一页 1...36 37 38 39 40 41 42...42 下一页

414 results for norconex-importer

414 results
for norconex-importer