-
## Summary
[`HstsResolver`](https://github.com/Norconex/collector-http/blob/89eee785c14d88889042477ac7839c48d09d4155/src/main/java/com/norconex/collector/http/fetch/util/HstsResolver.java#L66) does…
-
Hi,
1)We have scheduled task to re run the norconex job every week, we will have several urls removed from sitemap.xml file and several urls added in the xml file every week. But it is observed tha…
-
I have a field such as the one below:
```
/ip/Bolthouse-Farms-Organics-Premium-Matchstix-Julienne-Carrots-10-oz/44933639
```
With the configuration below, I expect "44933639" to be written back to…
-
I have to crawl an intranet site that provides the last modified timestamps of articles in a meta tag like this: ``
This is easily handled by `DateFormatTagger`. However, there is a problem with ti…
-
using Basic Authentication on a stand alone (no cloud) windows platform Solr 8.8.2 installation.
the crawl is successful and the error is thrown in the committer, SSL is turned off (for the moment)
…
-
The contents of the sharedStrings.xml file in the target xlsx file for crawling are as follows.
~~~xml
月日ガッピ会社名カイシャメイ金額キンガク支払日シハライビ締日シメビS社シャA社シャB社シャ
~~~
What I ultimately want to obtain is the…
-
I am updating my code to work under OpenJDK 11, as soon Oracle will stop supporting Java 8, and my institution, as government may be expected to do, is moving on.
After some adjustments, my tests m…
-
My crawler does a language detection on crawled documents and then assigns data such as "content", "title" and "description" to different fields based on the language detected. I use `ScriptTagger` fo…
-
Hi Pascal,
I am working on a website which include different domains, such as...
```
// Below are the domains in the start url section
www.rthk.hk
app3.rthk.hk
app4.rthk.hk
programme.rthk.hk
…
-
Hi,
We have a site we want to crawl and on which we have a large number sub directories of different names that we want to exclude.
With com.norconex.collector.core.filter.impl.RegexReferenceFi…
ghost updated
6 years ago