SalesforceAIResearch / uni2ts

Unified Training of Universal Time Series Forecasting Transformers
Apache License 2.0
868 stars 94 forks source link

Bug when downloading cloudops datasets #77

Open liu-jc opened 4 months ago

liu-jc commented 4 months ago

Describe the bug With the current datasets version, it cannot download cloudops datasets. To Reproduce

from datasets import load_dataset
dataset = load_dataset('Salesforce/cloudops_tsf', 'azure_vm_traces_2017')

Expected behavior It should successfully download the datasets.

Error message or code output Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.

Traceback (most recent call last):
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/builder.py", line 1973, in _prepare_split_single
    for _, table in generator:
  File "/root/.cache/huggingface/modules/datasets_modules/datasets/Salesforce--cloudops_tsf/c256e0ff4b38ace660f9c190f7ea36b6f11580926404e453a4b059ab54ae6b24/cloudops_tsf.py", line 251, in _generate_tables
    table = pq.read_table(filepath)
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/streaming.py", line 75, in wrapper
    return function(*args, download_config=download_config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/download/streaming_download_manager.py", line 812, in xpyarrow_parquet_read_table
    return pq.read_table(xopen(filepath_or_buffer, mode="rb", download_config=download_config), **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/download/streaming_download_manager.py", line 507, in xopen
    return open(main_hop, mode, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/root/.cache/huggingface/datasets/downloads/extracted/44a68b01b5facec6049e9d866260fdea631f258f755b50cff7b40c2f31f65ec1'

Current workaround Change the versions of a few dependencies.

datasets==2.12.0
fsspec==2023.5.0

Proposed solution It probably would be better if we directly change the dataset format in huggingface, then keep the same requirements for our current dependencies.

cc: @chenghaoliu89 @gorold