norconex-importer Search Results

414 results
for norconex-importer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/crawlers #227

how to crawl wordpress pages?

Hi, is there anyway to collect on wordpress pages? i used the minimal xml file without results. THX

m-gorn updated 8 years ago
3
Norconex/crawlers #212

Tries to follow links with "tel:" schema

#### Given A page linking to a [`tel:` URI](https://tools.ietf.org/html/rfc3966): ``` html Norconex test Phone Number ``` And the following config: ``` xml …

niels updated 8 years ago
2
Norconex/crawlers #209

another charset encoding issue

I just take the same [test page](https://herimedia.com/norconex-test.html) from the issue #202 config: ``` xml Date,Content-Type …

jetnet updated 8 years ago
8
Norconex/crawlers #215

Re-committed exists urls

I build my collectors, crawers and commiter by programming, NOT by using xml configurations. but now existed urls will be committed again when run my collector the second time. is there a flag or see…

bruce-genhot updated 8 years ago
6
Norconex/crawlers #217

Is there a tagger for removing html code from a field

I am using DomTagger to extract something, like below. ``` ``` the result is a piece of html code, can I use a tagger to remo…

bruce-genhot updated 8 years ago
4
Norconex/crawlers #194

Title and content are in messy code

When I do fetching against http://www.spprec.com/sczw/infodetail/?infoid=5f2c3843-86ce-4f22-a99d-c88e1c838aba&categoryNum=005002005, the returned title and content is in messy code.

bruce-genhot updated 8 years ago
18
Norconex/crawlers #202

Encoding correctly detected but not taken into account when …

When I crawl a non-Unicode document (or more precisely: a document in a charset other than my platform default), the crawler correctly detects the document's encoding (by inspecting the "Content-Type"…

niels updated 8 years ago
12
Norconex/crawlers #213

StandardRobotsTxtProvider turns Allow statements into exclus…

I believe that I am seeing improper behavior of the robots.txt parser / filter. #### Given A robots.txt file that disallows access to some parent path but allows access to exceptions within that path…

niels updated 8 years ago
3
Norconex/crawlers #220

Can I specify content encoding for DOMTagger ?

Now, DOMTagger handles all document in UTF-8, it's better if user can specify content encoding. by the way, a flag controlling the removal of HTML is also necessary

bruce-genhot updated 8 years ago
2
Norconex/crawlers #208

sitemapResolverFactory is not instantiating class specified …

Although config reference suggests that custom sitemapResolverFactory class can be specified, looks like the class attribite is ignored and StandardSitemapResolverFactory is always used.

radomirml updated 8 years ago
6

上一页 1...31 32 33 34 35 36 37...42 下一页

414 results for norconex-importer

414 results
for norconex-importer