norconex-importer Search Results

413 results
for norconex-importer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Norconex/importer #55

URL building for child elements

https://github.com/Norconex/importer/blob/49e9da592e6e0c373b138a6da1c544bfddc7e657/norconex-importer/src/main/java/com/norconex/importer/handler/splitter/impl/DOMSplitter.java#L172 IMO the URL shou…

sveba updated 7 years ago
3
Norconex/crawlers #407

When using ParseHandlers what are the rules for escaping HTM…

Hi there, In the example you have given here, the < and > signs are escaped: ``` <!--…

inpsolr updated 6 years ago
10
Norconex/importer #59

Unable to import tagValues in AWS Cloudsearch

I am trying to crawl web pages to store in AWS Cloud Search and facing problem in storing tags value in cloudsearch. Below are details of problem: I am able to see both title and h3 in Debug LOG.…

umeshkalia updated 7 years ago
2
Norconex/importer #56

ReplaceTagger and spaces in toField

Hi, I was trying to replace "&nbsp" (bad HTML tag extracted from the page I was crawling) with a " " with ReplaceTagger, I fought with the code not inserting the space. I just found that the fol…

liar666 updated 7 years ago
1
Norconex/importer #23

[DOMSplitter] JSoup issue with norconex-importer 2.5.2

Hi! I Get the following exception when I use the DOMSplitter : _java.lang.NoSuchMethodError: org.jsoup.nodes.Element.cssSelector()Ljava/lang/String; at com.norconex.importer.handler.splitter.imp…

sylvainroussy updated 7 years ago
6
Norconex/crawlers #405

Question: When crawling a website, how to transform ID with …

Hello all, While crawling a huge website, sometimes I would ran into having troubles with the id of my document being to large (in case of cloudsearch for example). I wanted to know if it's pos…

dgomesbr updated 6 years ago
3
Norconex/crawlers #418

How to exclude common headers and footers available on all p…

Copied from https://github.com/Norconex/collector-http/issues/412#issuecomment-340241616, by @krishnateja-ravipati : > I have a question regarding extracting content from a document. > > I would …

essiembre updated 6 years ago
5
Norconex/importer #57

DOMTagger and <head><script>

On the page: https://web-ast.dsi.cnrs.fr/l3c/owa/personnel.infos_admin?p_numero_sel=1361736 If I use a crawler with: ``` ``` I get the corr…

liar666 updated 7 years ago
2
Norconex/crawlers #350

How do I filter out SVG and other image files?

I'm very new to Norconex and am trying to configure it to crawl a site and add it to an existing Solr index. I've got a lot of issues, but I'll start with this one. When I run the crawler, it is inclu…

dkh7m updated 7 years ago
7
Norconex/importer #63

Question: External application tagger

Can you please recommend how to accomplish using an external application to tag documents. I need to be able to tag documents using its content and metadata (document.reference specifically) for thin…

jmrichardson updated 6 years ago
7

上一页 1...20 21 22 23 24 25 26...42 下一页

413 results for norconex-importer

413 results
for norconex-importer