-
I just tried to upgrade from 2.7.1 to 2.8.0 in my test environment. I didn't touch my configuration (which works in 2.7.1) at all.
It looks like I don't get any successful imports. I get two kinds …
-
We need to filter documents without the header Content-Length **and** with the header Transfer-Encoding set to chunked. This is the importer configuration I came up with:
```
.*ch…
FuePi updated
6 years ago
-
I think this is a Tika issue; I looked into it and it seems it was resolved before. I wonder if you ever come across this error. The message I have is:
`WARN [Importer] Could not import https://xx…
-
I am receiving many errors in what looks like files which have an embedded file. In my case, .msg files (exchange messages) containing attachments. Files (.msg) without attachments appear to be impo…
-
I want to extract the text present inside all the `` tags in the page i am crawling.
I have created a field named "pagecontent" with collection(Edm.string) type and used below setting to fetch the te…
-
[MyMetadataFetcher.zip](https://github.com/Norconex/collector-filesystem/files/1655329/MyMetadataFetcher.zip)
Hi,
Is there a way to fetch meta data from external properties file. We have integrate…
-
I am trying to crawl sitemap xml file which includes bulk urls and commit the documents to azure service. There will be more than 400 documents getting stored in the committer-queue directory.
Norcon…
-
Hi,
I keep getting the following error when committing documents to AWS CloudSearch:
```
CloudSearch: 2017-11-20 15:15:40 INFO - Sending 10 documents to AWS CloudSearch for addition/deletion.
…
-
Trying to rename a field, but doesn't seem to work for me. The original "title" field is still in the data.
I tried both
-
Hi Pascal,
Can you help me to avoid the header and footer data from a page being crawled
Please find below the
[htmlfile _l2tm.txt](https://github.com/Norconex/collector-http/files/1208703/htmlf…