ENSTA-U2IS-AI / torch-uncertainty

Open-source framework for uncertainty and deep learning models in PyTorch :seedling:
https://torch-uncertainty.github.io
Apache License 2.0
287 stars 19 forks source link

:sparkles: Add support for Tiny-ImageNet-C #36

Closed o-laurent closed 1 year ago

o-laurent commented 1 year ago

TinyImageNet-C can be downloaded here: https://berkeley.app.box.com/s/6zt1qzwm34hgdzcvi45svsb10zspop8a

o-laurent commented 1 year ago

It seems that box.com datasets cannot be downloaded directly without auth tokens. I've sent a mail to make sure. Going for a partial solution with a manual download :disappointed: @clementlrd Could be worth including the original corruption functions.

clementlrd commented 1 year ago

I'll look into an alternative hosting solution. In the meantime, I could set up the manual solution you mentioned along with the dataset. That seems a good start.

It seems to me to be inefficient to use the original functions, in addition to random processes that can introduce variance into the results. Let me know what you think about it.

o-laurent commented 1 year ago

I don't know if we are allowed to host the dataset, for instance, on Zenodo. We could ask Hendycks directly.

Thanks, great point for the variance, as the corruptions are random! Concerning the inefficiency, I considered adding some "download_and_build" method to generate the dataset only once, but your second remark is a significant concern.

o-laurent commented 1 year ago

The dataset seems ~Apache 2.0 licensed (even though I wonder if you can license a modification of ImageNet that is already licensed). It includes the distribution as long as we keep the license from Hendrycks in his name. So it seems possible to host it on Zenodo, stating that it isn't ours.

o-laurent commented 1 year ago

I had the confirmation that automated download is not possible with Berkeley connect. I've created another mirror here https://zenodo.org/record/8206060. Adding the dataset tomorrow.

o-laurent commented 1 year ago

@clementlrd pushed in 0b9cdc2. Close the issue if it works.

clementlrd commented 1 year ago

Thank you for your great job, I will test it tomorrow !

clementlrd commented 1 year ago

I think all that's missing are the imports into the two __init__.py and this comment about the name: https://github.com/ENSTA-U2IS/torch-uncertainty/commit/0b9cdc2e4f4f388118de6b5eb15af29ba4786a39#r123895845

I can't test the dataset with the little internet I have now, you can close the issue after that and I'll reopen it if I have any problems.