JuliaML / MLDatasets.jl

Utility package for accessing common Machine Learning datasets in Julia
https://juliaml.github.io/MLDatasets.jl/stable
MIT License
228 stars 46 forks source link

[Discussion] Moving to HuggingFace for some databases #242

Open Dsantra92 opened 2 months ago

Dsantra92 commented 2 months ago

Some of the (graph) databases that we are trying to support might have either of the following problems:

  1. Hosted in university servers or a non-trusted source which cannot provide proper download speeds though out the globe.
  2. Datasets that aren't hosted anywhere and come with a license
  3. Datasets stored as python formats.

HuggingFace has now good set of community maintained graph datasets. If we come across any of these above issues for a dataset, we can try to add these datasets to HF and then pull from HF and then process as required. This I believe will largely reduce code for integrating and testing new datasets. I am not sure about the planned support for https://github.com/FluxML/HuggingFaceApi.jl but this seems to me like a better idea than relying on links that can fail without warning.

cc: @CarloLucibello

CarloLucibello commented 1 month ago

It would be nice to have HF as official storage. Maybe we can replace current download links with HF's ones without having to resort to HF's api?

Dsantra92 commented 1 month ago

That sounds nice. I will see if we can do bypass calling HF's API.