-
https://github.com/Norconex/importer/blob/49e9da592e6e0c373b138a6da1c544bfddc7e657/norconex-importer/src/main/java/com/norconex/importer/handler/splitter/impl/DOMSplitter.java#L172
IMO the URL shou…
sveba updated
7 years ago
-
Hi there,
In the example you have given here, the < and > signs are escaped:
```
<!--…
-
I am trying to crawl web pages to store in AWS Cloud Search and facing problem in storing tags value in cloudsearch.
Below are details of problem:
I am able to see both title and h3 in Debug LOG.…
-
Hi,
I was trying to replace " " (bad HTML tag extracted from the page I was crawling) with a " " with ReplaceTagger, I fought with the code not inserting the space.
I just found that the fol…
-
Hi!
I Get the following exception when I use the DOMSplitter :
_java.lang.NoSuchMethodError: org.jsoup.nodes.Element.cssSelector()Ljava/lang/String;
at com.norconex.importer.handler.splitter.imp…
-
Hello all,
While crawling a huge website, sometimes I would ran into having troubles with the id of my document being to large (in case of cloudsearch for example).
I wanted to know if it's pos…
-
Copied from https://github.com/Norconex/collector-http/issues/412#issuecomment-340241616, by @krishnateja-ravipati :
> I have a question regarding extracting content from a document.
>
> I would …
-
On the page:
https://web-ast.dsi.cnrs.fr/l3c/owa/personnel.infos_admin?p_numero_sel=1361736
If I use a crawler with:
```
```
I get the corr…
-
I'm very new to Norconex and am trying to configure it to crawl a site and add it to an existing Solr index. I've got a lot of issues, but I'll start with this one. When I run the crawler, it is inclu…
dkh7m updated
7 years ago
-
Can you please recommend how to accomplish using an external application to tag documents. I need to be able to tag documents using its content and metadata (document.reference specifically) for thin…