-
If the option `parseScriptTags` is set to false simplecrawler only crawls the first page and stops. I recognized this behavior with the Content Management System [Contao](https://contao.org/de/). It c…
-
The current implementation assumes that a URL is legal only if the sitemap URL is a substring of the URL. This doesn't hold for some websites such as nytimes.com in which the sitemaps are actually on …
-
After running a crawler with `3` and just one URL, I have analysed the log and noticed that several URL are processed several times via the events: `DOCUMENT_FETCHED, CREATED_ROBOTS_META, URLS_EXTRAC…
-
I'm not sure if this can be done with the crawler. It's very simple. I need to process several URLs (say 100+ and more) each one with its own filter.
If I've understood your crawler behaviour, the fol…
-
`ucss -h http://www.host.com -c http://www.host.com/path/to.css` crawls the page and outputs the urls as expected. however there are links to subdomains and youtube so this option is not suitable.
`u…
-
-
It seems that every time Varnish is restarted, the sessions are lost.
Customer shopping carts are lost as a result of this.
Is this normal behaviour for Varnish?
Petce updated
9 years ago
-
If we work with several people we need an additional staging site to test commits, and to allow other to see whether the proposed change solves the issue before pushing it to the production site.
ifrik updated
9 years ago
-
Function _isInternalDecisionMaker falsely detects that the link is external
```
protected Func _isInternalDecisionMaker = (uriInQuestion, rootUri) => uriInQuestion.Authority == rootUri.Authority;
```…
-
Crawler: Add support for storing crawled docs into MongoDB.
The Collection data format should look like something like this, based on what we crawl and what the UI needs:
- title
- text
- articleDate…