-
Hi,
I'm currently developing my own committer. In order to debug my own code, I wanted to keep the FileSystemCommitter , so that I can compare the output of both committer.
The configuration file lo…
-
Hi,
I'm trying to parse XML files (from https://pairbulkdata.uspto.gov/) in which patents are described.
In order to parse/import these files, I use a DOMSplitter to separate the data for each pate…
-
hi Norconex team,
I'm not sure what package this issue is related to - collector or importer, so posting it here:
something strange happens with the `"date"` http-meta tag: its value gets multiplied…
-
I've downloaded the software onto an Ubuntu 14.04 system with this java:
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (…
-
I am new in norconex collector http, I did some changes in configuration files to be able to extract files from website but can't get these grabbed files?
is norconex save files by extracted urls ? an…
-
Hi,
I'm trying to run the example from the filesystem by replacing the FileSystemCommitter with the SolrCommitter.
So the config file looks like that:
`
${workdir}/logs
${workdir}/progress
…
-
I used to insert URL without writing the home page, as an example:
`
https://en.wikipedia.org/
`
instead doing
`
https://en.wikipedia.org/wiki/Main_Page
`
in first case it doesn't crawl, but in the…
-
Hi All,
When I include specific data types in filtering, crawler doesn't work properly, It seems as I didn't include html pages he be unable to reach these files.. is their a solution for that ?
-
I lowered threads to 1 and this seems to work. Not sure if that was the fix or that a corrupted file was the culprit.
[EDIT] Ran into issue again, tmp dir contained a lot of tika files. Increasing fi…
-
We are using the Norconex HTTP Collector to crawl about 4.600 files and send the extracted information to a Solr server.
The crawling process itself works fine and during this process all batches are…