norconex-importer Search Results

413 results
for norconex-importer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/crawlers #262

Using extract="outerHtml]" in a domTagger's fields leads to …

I just ran the following simple crawler: ``` ./tests-output/testattribute/progress ./tests-output/testattribute/logs http://avax.news/fact/The_Day_in_Photos_Ju…

liar666 updated 8 years ago
4
Norconex/crawlers #234

infinite loop with many threads

I make it get into an infinite loop, with these rules: ``` xml jpg,gif,png,ico,css,js,gz,bz,tgz http://.*nz/.* http://.*nz ``` This is the log file: ``` xml [niko@dev1 norconex-collector-…

nuliknol updated 8 years ago
15
Norconex/crawlers #252

Minimal Example does not work...

I've: - downloaded latest code from: http://www.norconex.com/collectors/collector-http/download - unzipped the file - gone to the root of the unzipped dir - run: . ./collector-http.sh -a start -c exam…

liar666 updated 8 years ago
2
Norconex/crawlers #244

Unparseable date

I would like to parse the Last-Modified date so it fits the format expected by the Solr TrieDateField class: ``` YYYY-MM-DDThh:mm:ssZ ``` (https://cwiki.apache.org/confluence/display/solr/Working+wi…

V3RITAS updated 8 years ago
7
Norconex/crawlers #253

Is it possible to extract several records from a single page…

Hi, I've used Heritrix for a while, so I understand how to crawl websites. But since I'm not satisfied with Heritrix, I'm currently looking at alternative. Norconex's API docs are good and the XML c…

liar666 updated 8 years ago
4
Norconex/crawlers #269

Unexpected REJECTED_FILTER

Hello, I have follow configuration (file is attached) [config.txt](https://github.com/Norconex/collector-http/files/335445/config.txt) Now, I run the job, and as result see: INFO [CrawlerEventMana…

olgapshen updated 8 years ago
11
Norconex/importer #22

Is it possible to keep html tag in .cntnt ?

Hi Pascal, I'm doing a little project with norconex http collector which will fetch news that with my city in the keywords field of metadata from big news website . The fetching works well but the …

fensifan updated 8 years ago
5
Norconex/crawlers #275

Is there a way to get the parent url of fetched url ?

I wonder if there is a way to get the url which contains the fetched url.

doaa-khaled updated 8 years ago
6
Norconex/crawlers #223

HTML parser - commented out encoding meta-tag

hi Pascal, you wont believe, but I just found another encoding issue :smile: Source code: ``` html ``` The parser cannot recognize the content correctly (for HTML entities is used UTF-8 an…

jetnet updated 8 years ago
6
Norconex/importer #13

Import only pages with url matched regexp

I want importer accept only pages which url match regexp from config. I believe `java Class RegexMetadataFilter` does that. The question is: which metadata field match page's url and does it exist at …

AntonioAmore updated 8 years ago
9

上一页 1...29 30 31 32 33 34 35...42 下一页

413 results for norconex-importer

413 results
for norconex-importer