farach / huggingfaceR

Hugging Face state-of-the-art models in R
Other
136 stars 16 forks source link

hf_load_dataset() not working for local csv #32

Closed samterfa closed 1 year ago

samterfa commented 1 year ago

This code should ideally work based on this documentation: hf_load_dataset("csv", data_files = "iris.csv"). However, it fails on this line of datasets.R: dataset_base <- reticulate::py$load_dataset(dataset). Moving dataset_base <- reticulate::py$load_dataset(dataset) further down in the code to where dataset_base is used would fix this issue but I wasn't sure if it's placement was important for something else or if other refactoring would make sense.

jpcompartir commented 1 year ago

Perhaps splitting the function into two -

  1. local dataset (but is it just easier to use pre-existing read_ functions?)
  2. directly from The Hub

the current load_dataset function should cache any downloaded datasets I think?

Would add that the hf_load_dataset() function was built too much around the 'emotions' dataset, and many datasets have a different structure. Do we try to cater for all datasets stored on The Hub?