-
I'm attempting to resolve an error I see when doing an initial test crawl and seeing some strange behavior. First, here's the relevant parts of my config file:
https://www.myredact…
-
I wanted to crawl pages linked as sub-domains.
But 'stayOnDomain' doesn't handle them as same domain.
Case
- Root URL : http://www.some.com
- Child URL : http://hellosub.some.com/page.html
…
-
Properties such as threadId are only relevant to dynamic analysis.
Should they be first class properties in this format, or left for users to put in catch-all property bags?
Also, I think one coul…
-
Hi!
Thanks for a very useful module. I'm unfortunately experiencing an exception when trying to parse a url where the robots.txt download redirects to an invalid url. The source url is `http://99ra…
ghost updated
7 years ago
-
I am trying to set up a norconex connector for a site
and my issue is that the URLs under the div portion is not getting crawled.
Attaching the configuration code here:-
```xml
#set($http…
-
Hi,
I am both a big fan of Ansible and all kinds of VPN / proxy software. So I am thrilled to find such a awesome, detailed documented project like Streisand.
I am thinking about contributing a …
-
while running crawler in https://github.com/sibiryakov/frontera-google I get the following error.
```
2016-10-04 23:38:19 [scrapy] ERROR: Spider error processing (referer: None)
Traceback (most rece…
voith updated
7 years ago
-
The current mock implementation of the regtest insight api requires authentication while our production api doesn't require authentication.
Behavior should be the same.
-
The Research Methods and Analytics subdomain is missing from the engineering section.
-
Hey there,
I have multiple issues on windows
First a non Windows specific issue : `git` is required has some packages are linked to github repos.
Then, `a11ym` behave strangely on windows. When I t…