fsspec / adlfs

fsspec-compatible Azure Datake and Azure Blob Storage access
BSD 3-Clause "New" or "Revised" License
178 stars 104 forks source link

`IsADirectoryError` exception when getting a directory #284

Closed gabrielmbmb closed 2 years ago

gabrielmbmb commented 3 years ago

What happened: IsADirectoryError raised while trying to download a directory from Azure Blob Storage to local file system using AzureBlobFileSystem.get method with recursive=True.

What you expected to happen: the remote directory is downloaded without raising any error.

Minimal Complete Verifiable Example:

import os

from adlfs import AzureBlobFileSystem

fs = AzureBlobFileSystem(
    account_name=os.getenv("ABS_ACCOUNT_NAME"),
    account_key=os.getenv("ABS_ACCESS_KEY")
)

fs.get(
    "mlops-scope/dataset/082515e9-44eb-47e0-97f5-1fdaa762953b/raw/test.parquet/", 
    "/tmp/test.parquet/", 
    recursive=True
)

Anything else we need to know?:

This is the traceback I got:

Traceback (most recent call last):
  File "/home/gabriel.martin/ScopeClassifier/test.py", line 13, in <module>
    fs.get(
  File "/home/gabriel.martin/ScopeClassifier/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 88, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/gabriel.martin/ScopeClassifier/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 69, in sync
    raise result[0]
  File "/home/gabriel.martin/ScopeClassifier/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/gabriel.martin/ScopeClassifier/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 436, in _get
    return await _run_coros_in_chunks(
  File "/home/gabriel.martin/ScopeClassifier/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 210, in _run_coros_in_chunks
    results.append(await coro)
  File "/usr/lib/python3.9/asyncio/tasks.py", line 614, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/home/gabriel.martin/ScopeClassifier/.venv/lib/python3.9/site-packages/adlfs/spec.py", line 1629, in _get_file
    with open(lpath, "wb") as my_blob:
IsADirectoryError: [Errno 21] Is a directory: '/tmp/test.parquet'

Environment:

gabrielmbmb commented 3 years ago

Seems like fsspec.AbstractFileSystem.get method is passing (in lpath argument) to adlfs.AzureBlobFileSystem.get_file method the local path of the directory where the content of the remote directory should be saved.

In my example, adlfs.AzureBlobFileSystem.get_file is receiving "/tmp/test.parquet" in the lpath argument, thus raising IsADirectoryError when trying to open.

gabrielmbmb commented 3 years ago

Not related to the issue, but the topic of this repository should be hacktoberfest not hactoberfest.