Shifts-Project / shifts

This repository contains data readers and examples for the three tracks of the Shifts Dataset and the Shifts Challenge.
https://shifts.ai
Apache License 2.0
226 stars 50 forks source link

Broken link: weather data #37

Closed IbtihalFerwana closed 1 year ago

IbtihalFerwana commented 2 years ago

I receive 404 response.

luna57-lr commented 2 years ago

I got the same pproblem

ebo commented 2 years ago

are you referring to canonical-trn-dev-data.tar that was being downloaded from https://storage.yandexcloud.net/yandex-research/shifts/weather/canonical-trn-dev-data.tar ? It is still missing.

Does anyone have an old copy they can share so we can load it some place?

VatsalRaina commented 1 year ago

Hi! I have checked the download link and it appears to work fine for me.

Also, the link has both train and dev-in - so what is supposedly missing?

ebo commented 1 year ago

Unless something has changed since I last gave it a poke in November, the canonical-trn-dev-data.tar is no longer on the https://storage.yandexcloud.net site. If you run the tutorial in the weathers folder, you will see a comment that the data is located at "https://storage.yandexcloud.net/yandex-research/shifts/weather/canonical-trn-dev-data.tar". So I was not able to run the tutorial examples. Other parts of shifts works, but I was hoping to specifically verify the examples before writing a bunch of front end stuff to scrape and ready EPA and other weather data for Shifts. If you got a copy of canonical-trn-dev-data.ta some place, please work with me to get that so I can poke this just a bit.

ps: sorry for the delay in response. I am on holiday travel.

VatsalRaina commented 1 year ago

Hi @ebo I understand. The original link was used to provide the data during the running of the challenge. After finishing the challenge, all the canonical data is made available together at: https://storage.yandexcloud.net/yandex-research/shifts/weather/canonical-partitioned-dataset.tar This includes the files train, dev-in and dev-out. Hence, you can just select these 3 files to verify the examples in the tutorial. The tutorial itself has been updated to indicate this. Hope that helps!

ebo commented 1 year ago

Woo Hoo! It is downloading. Thank you for fixing this.

On Jan 11 2023 8:16 AM, Vatsal Raina wrote:

Hi @ebo I understand. The original link was used to provide the data during the running of the challenge. After finishing the challenge, all the canonical data is made available together at: https://storage.yandexcloud.net/yandex-research/shifts/weather/canonical-partitioned-dataset.tar This includes the files train, dev-in and dev-out. Hence, you can just select these 3 files to verify the examples in the tutorial. The tutorial itself has been updated to indicate this. Hope that helps!

ebo commented 1 year ago

As a note, I took a quick moment to check it out. The names of the directory and dataset has changed. I was able to replace "data/" with "canonical-paritioned-dataset/shiftscanonical" in the tutorial for the traing.csv, dev_in.csv, and dev_out.csv. Other than that I was able to get it to work up to the retention curves, where it died with an issue with sklearn (probably a problem on my end with a clean install in a new computer). I'll post a separate issue if there is anything needed to fix that in shifts. Other than that I look forward to testing this out further.

ebo commented 1 year ago

also, you might consider adding "--no-clobber" to the wget command so that you enforce only downloading once in case it gets rerun via a high level jupyter run/restart. If you want me to submit pull requests instead, I can do that as well.