-
Hey,
i've an issue with Read time outs. For some channels it works perfectly but for others not. It only happen from time to time. I have no request in the log file for that time, so the crawler did …
-
_Request created from @comschmid comment in issue #163._
Allows to change the character case of field names, like `CharacterCaseTagger` does for field values.
-
Crawling `www.feccoo-extremadura.org`, I get just one document, the one for the domain, but navigating with a browser, it automatically converts to `www.feccoo-extremadura.org/ensenanzaextremadura` an…
-
I encountered a page where the link " was present. It is obviously a fawlty designed URL. However, when encountering this URL, Norconex discards the current page with it, throwing the following stack …
-
Crawling a flickr site, say, `https://www.flickr.com/photos/gobiernoextremadura` with:
```
https://www\.flickr\.com/photos/gobiernoextremadura/.*
```
I only get 3 documents:
```
…
-
This error happens with the seed URL for the site, so no document in the site is processed. What can I do?
```
MC(crawler): 2015-05-05 18:57:27 ERROR - Cannot fetch sitemap: http://valitsus.ee/sitema…
-
I tried to crawl a site and get following error in log:
```
site: 2015-06-10 21:38:12 DEBUG - ACCEPTED document reference. Reference=http://www.site.com/Projects/c2c/channel/images/'+L140413[1+Math.r…
-
I found a random behavior in the Committer. Running a crawl against the same url will give different results. Here is a section of output when the behavior is correct
```
INFO - AbstractCrawler …
-
From @madsbrydegaard, moved from https://github.com/Norconex/collector-http/issues/48#issuecomment-101662531:
I tried implementing the filter option:
```
.\bkeyword\b.
```
However pages witho…
-
Only seeing this in console, not log file. Occured since using snapshot.
Exception in thread "pool-1-thread-2" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(…