norconex-committer Search Results

318 results
for norconex-committer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/crawlers #69

Restarting HTTP Collector

Loading documents into Solr with HTTP Collector, due to an issue, the computer restarted. Just to be sure, what is the offical advice to continue the process where HTTP Collector was interrupted?. Aft…

csaezl updated 9 years ago
54
Norconex/crawlers #135

One URL COMMITED several times in a crawler run

After running a crawler with `3` and just one URL, I have analysed the log and noticed that several URL are processed several times via the events: `DOCUMENT_FETCHED, CREATED_ROBOTS_META, URLS_EXTRAC…

csaezl updated 9 years ago
13
Norconex/importer #15

Html elements import

I'm using the latest Norconex Http collector. By default the importer removes Html elements and just keeps the body text. How do I configure it to keep specific Html elements. For example,I would lik…

yvesnyc updated 9 years ago
8
Norconex/crawlers #98

Documents removed after read time-out.

Is it possible to only remove documents with 404 status code? (and also log the broken link)

OkkeKlein updated 9 years ago
14
Norconex/crawlers #67

Potential concurrency issues AbstractMappedCommitter

Documents from other simultanious running jobs are added as ICommitOperation to the jobs commiter when using a commiter based on AbstractMappedCommitter. Using a simpel Commiter like this will log ou…

leonardsaers updated 9 years ago
8
Norconex/crawlers #70

Wrongful dependency on Java 8.

From @leonardsaers, java 8 was required to make the latest shanshot work. See ticket https://github.com/Norconex/collector-http/issues/66#issuecomment-85087299 Java 7 should be supported.

essiembre updated 9 years ago
5
Norconex/crawlers #66

HttpImporterPipeline fails to run stage HttpMetadataChecksum…

I have a strange behaviour where pages are added for indexing if it's new and deleted if it has been crawled before. The expected behaviour should be to skip indexing if page is unmodied or index if …

leonardsaers updated 9 years ago
9
Norconex/crawlers #55

Text from PDF, DOC, etc files

Since it is not unusual that such types of files don't have title, author, subject, etc., I'm wondering if there is a way of capturing about (say) 100 characters or so from the beginning of the docume…

csaezl updated 9 years ago
15
Norconex/committer-solr #3

Passing arguments to Solr update calls

From @csaezl, originally posted on https://github.com/Norconex/collector-http/issues/74#issuecomment-90225426: > Talking again about /update parameters, is a way of passing update.chain=langid to So…

essiembre updated 9 years ago
7
Norconex/committer-solr #2

Upgrade solrj library from solrj-4.7.0 to solrj-5.0.0 and t…

martinfou updated 9 years ago
5

上一页 1...26 27 28 29 30 31 32...32 下一页

318 results for norconex-committer

318 results
for norconex-committer