-
A [second milestone release](https://opensource.norconex.com/collectors/http/download) of the HTTP Collector was just made. One of the most significant features comes from the [Importer](https://opens…
-
I have a collector set up to crawl an Intranet site with many PDFs. There are many cases of errors reading embedded fonts in the PDFs.
```
WARN - PDTrueTypeFont - Could not read embe…
-
My client is using version 2.8.2-SNAPSHOT and found that some urls didn't updated in the search engine.
For example: https://store.acer.com/de-de/nitro-5-gaming-notebook-an517-51-schwarz-13
Chec…
-
Hi!
I looked into your Java examples, and I found it straight forward, but because of the lack of examples, I'm having a difficulty getting to what I exactly need to do with my code.
I'm trying to d…
-
Hello,
We have a crawler running that pulls results into an Azure Search index. There are a number of items that do not appear in the index even though it looks like the crawler logfile specifies t…
-
Hello,
I am crawling a website, where some entries in the sitemap will have images like so:
```
https://example.com/about
2021-01-28T16:11:08+01:00
weekly
0.7
…
-
On a site with sitemap path specified in robots.txt Norconex doesn't recognize this specification.
The SitemapResolverFactory is configured to respect only specifications from robots.txt by setting t…
-
Hi
I noticed that, when I start `HttpCollector `and wait to the end, then everything works well, but when I try to stop them, then I got a lot of exceptions in logs. Collector are stopped, but there…
-
After I run the minimal test example, there is no `crawledFiles` directory. Looking at the output, it looks like it may be related to a `javax.net.ssl.SSLHandshakeException: PKIX path building failed:…
-
When I check oss-sonatype and maven central, I see only 5.2, and cannot build without masking in my pom. Hard to mask, because I don't know what depends on 5.3, so I don't know where to exclude. W…