Closed o-laurent closed 1 year ago
It seems that box.com datasets cannot be downloaded directly without auth tokens. I've sent a mail to make sure. Going for a partial solution with a manual download :disappointed: @clementlrd Could be worth including the original corruption functions.
I'll look into an alternative hosting solution. In the meantime, I could set up the manual solution you mentioned along with the dataset. That seems a good start.
It seems to me to be inefficient to use the original functions, in addition to random processes that can introduce variance into the results. Let me know what you think about it.
I don't know if we are allowed to host the dataset, for instance, on Zenodo. We could ask Hendycks directly.
Thanks, great point for the variance, as the corruptions are random! Concerning the inefficiency, I considered adding some "download_and_build" method to generate the dataset only once, but your second remark is a significant concern.
The dataset seems ~Apache 2.0 licensed (even though I wonder if you can license a modification of ImageNet that is already licensed). It includes the distribution as long as we keep the license from Hendrycks in his name. So it seems possible to host it on Zenodo, stating that it isn't ours.
I had the confirmation that automated download is not possible with Berkeley connect. I've created another mirror here https://zenodo.org/record/8206060. Adding the dataset tomorrow.
Thank you for your great job, I will test it tomorrow !
I think all that's missing are the imports into the two __init__.py
and this comment about the name: https://github.com/ENSTA-U2IS/torch-uncertainty/commit/0b9cdc2e4f4f388118de6b5eb15af29ba4786a39#r123895845
I can't test the dataset with the little internet I have now, you can close the issue after that and I'll reopen it if I have any problems.
TinyImageNet-C can be downloaded here: https://berkeley.app.box.com/s/6zt1qzwm34hgdzcvi45svsb10zspop8a