-
I'm trying to see if I can integrate `savepagenow` into my election night scraping system. The idea would be to save online results files into the Wayback Machine when my system detects the results ha…
-
### Summary
When using `pack` >= `0.35.0`, `pack build` will fail if the Docker daemon configuration in `/etc/docker/daemon.json` is configured with `default-address-pools`.
---
### Reproduc…
-
### Describe the bug
PS C:\Projects\2-Aurora-Base\src\Tests\ScenarioTests\Powershell\Scripts> **$AllTags='abc.123=123 cdx.432=345 abc.345=123 cdx.4321=345'**
PS C:\Projects\2-Aurora-Base\src\Tests\S…
-
When playing back web archives using ReplayWeb.page, we can get very high quality playback, and I think this is down to:
- how you index POST (and other non-GET???) requests.
- you fuzzy matching …
-
Pretty much request 5 in #883.
Might be more effective to also try Google Cache if Wayback Machine fails.
-
Wayback API had matchType option, example:
https://web.archive.org/cdx/search/cdx?url=https://twitter.com/jack/statuses&matchType=prefix
Which returns:
```
com,twitter)/jack/statuses/"/antarni…
-
By Laura Waldoch, CUL:
We’ve noticed a data error under the Theme “Cambridge Network”, the following record:
Accelrys
Archived date: 2012-11-21 http://www.accelrys.com/
Clickin…
-
Hi,
I am currently working on machine learning project.
I decided to use newspaper3k library to get articles by dates.
I use cnn.com, nytimes.com, and fox.com to get articles.
However, they usual…
-
**Background**
Current implementation has a lack of text formatting features implemented in other chemical formats (CDX/CDXML).
Mostly it's related to paragraph-formatting features: indents, and lef…
-
Hi,
Is there a way to do it, e.g. download only the first or the last snapshot of the day?