complementizer / wcep-mds-dataset

MIT License
56 stars 14 forks source link

direct link to dataset #4

Closed shahbazsyed closed 4 years ago

shahbazsyed commented 4 years ago

Crawling web archive links fails with read timed out (even on multiple retries)

Article `download()` failed with HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=7) on URL https://web.archive.org/web/20190910020532/https://www.cbc.ca/news/world/gas-explosion-suspected-after-blast-at-belgium-sports-centre-kills-1-1.3736509
Article `download()` failed with HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=7) on URL https://web.archive.org/web/20190910020459/https://www.cbc.ca/news/canada/toronto/stabbing-scarborough-1.3735718
Article `download()` failed with HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=7) on URL https://web.archive.org/web/20190910020833/https://www.dailymail.co.uk/wires/reuters/article-3764774/Huge-blast-Somali-capital-clouds-smoke-seen.html
Article `download()` failed with HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=7) on URL https://web.archive.org/web/20190910020949/https://www.foxnews.com/us/5-people-killed-in-midair-plane-crash-in-alaska-officials-say

Would be great to get a direct link to the complete dataset.

Thanks!

xnliang98 commented 4 years ago

I meet the same situation like this, crawling web archive links fails with read timed out (even on multiple retries). cloud you provide a direct link to the complete dataset. Thanks!

MiaoYYu commented 4 years ago

Do you solve this issue? I got stuck in this place too.

TysonYu commented 4 years ago

The same issue, can you provide a link so that we can download the dataset directly?

chrishokamp commented 4 years ago

Hi if you check https://github.com/chrishokamp/dynamic-transformer-ensembles there are some links to download a version of the dataset

complementizer commented 4 years ago

There is a link in our readme now.