-
Got this error when the start path is network drive, eg. \\\\cap-index\c$\WISD\
using Branch: 2.9.0 snapshot.
FYI, this error not found in 2.8.0
```
FilesystemCrawler: 2019-05-17 11:25:55 ER…
-
hello Pascal,
I'd like to generate a thumbnail image for every incoming `document.contentFamily = image` using an `ExternalTransformer` script with ImageMagick tools.
But it seems, the provided bi…
-
Hi Pascal,
Is it possible to have the Copy Tagger used only if the toField is null or not present or just empty string. Case in point - i have a title that is null but i have the collector.referrer…
-
Using the dom splitter i extract ```` tags as new documents
```xml
```
and then i capture the attributes
```xml
text/x-php
text/plain
…
-
-
Hi,
I'm trying to crawl the following page:
http://pubs.acs.org/doi/abs/10.1021/acschemneuro.7b00162
This page first redirects to:
http://pubs.acs.org/doi/abs/10.1021/acschemneuro.7b00162?cookie…
-
we crawl many sites using the same configuration templates and configured the `GenericMetadataFetcher` globaly for all sites. Some sites do not allow the `HEAD` request, and the crawler stops at the v…
-
Hi,
What is the expected behavior when you encounter a canonical link in a document which points to another domain, and you have stayOnDomain set to true?
I'm seeing that the canonical link is fol…
-
Hi,
How can I suppress the title and dc:title added by norconex api from xml. I just want to include these tags which are derived from tika.
Tika derived :
`Competitive Landscape Overview for Fy…
-
Hi,
currently i have the situation that i want to only have the "main" content parsed in an html document. Like this:
```xml
text/html
``…