Open batmaxx opened 6 months ago
So basically you want to use only _maybe_download_dataset(**dataset_args, refresh_days=refresh_days) ? There is no need to call the "load" method if you can just call "_maybe_download_dataset" or am I missing something? :)
I mean basically we have the function already that you need which is "_maybe_download_dataset"? The only thing the load methods adds is the lowercase conversion
Yes using _maybe_download_dataset
directly works for sure, however the "_" signals an internal usage only. That's why I thought making it explicite with an extra argument would make it easier for users to either download and load the dataset or only download the dataset.
For my use case I'm fine using _maybe_download_dataset
directly. If this change is not deemed as important then I can close this issue 👍
Feature Suggestion
Description
The
load()
function (inload.py
) downloads the data (zip+unzip) and then directly load the data into a pandas dataframe. Would it be possible to add an argument to only download the data (and thus return nothing), skipping the pandas dataframe creation altogether? That would allow people only interested in downloading the data or using something different than Pandas to also use the nice functionalities implemented in_maybe_download_dataset
(url, cashing, filename...).Code
An easy non-breaking change could be to add an argument
create_pandas_dataframe=True
:Example
Happy to make a PR if necessary. Thanks!