-
Hi all,
We've had to unpublish a lot of our feeds as we've been experiencing an issue which I believe is related to this dashboard. Firstly, it doesn't appear to publish a reasonable UA so this is …
-
```
add crawler to parse html source and create html skeleton, then build UI
modules based on the skeleton.
```
Original issue reported on code.google.com by `John.Jian.Fang@gmail.com` on 16 Feb 201…
-
```
At the moment why do not use any form of automated testing in our code.
The basic structure for testing the web application is already there in
TurboGears, we should expand that.
But I think we …
-
The [CORS fix](https://github.com/jsanahuja/InstagramFeed/issues/55) creates a new problem for me: `Cross-Origin Read Blocking (CORB) blocked cross-origin response https://www.instagram.com/p/CL7rMUKD…
-
**Bug Type:** Reliability
**Test Method:** Automation
**Description/Summary**
Login is sometimes blocked by a warning message during the execution of automated tests in production.
This behavior…
-
A common issue is that it is not clear if a problem with a site is due to gaps in the crawl, or replay-time rewriting limitations. It should be possible to use proxy playback mode to evaluate the craw…
-
I need to make an archive that requires a login (I wish I could use Pywb but the OneLogin service has issues with it) and need to save a whole bunch of links, so pressing the start button over and ove…
-
ToDo:
- Automate Crawl List:
- Implement Data Structure to Map Twitter handles to Domains
- Implement method to look for already crawled authors and update only new tweets / crawl new autho…
-
## Is your feature request related to a problem?
From a slack convo with a user - it would be nice to have a way for users to place identified bots
https://posthogusers.slack.com/archives/C01GLBKH…
-
e.g. Amazon always returns 405 upon HEAD requests.
We should send a GET after all suspicious error codes (esp. 403 and 405) to get better results.