PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.24k stars 1.63k forks source link

Failure in pull step – directories cause errors in pull with ADLS2 storage #13100

Open attekei opened 1 year ago

attekei commented 1 year ago

I get this error when running a worker locally:

  File "/site-packages/prefect_azure/deployments/steps.py", line 189, in pull_from_azure_blob_storage
    with open(target, "wb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/private/var/folders/rz/fywxxdh16cn2yw0450110kgr0000gq/T/tmpxz98pl1aprefect'

This is probably because we use Data Lake Storage Gen2 where folders are blobs as well. In basic Azure storage this probably isn't an issue as there folders are only virtual (aka don't exist as blobs).

Not sure how to best detect those blobs which are folders.

(We can easily migrate to basic Azure storage so this is not a blocker. Just important to fix or alternatively document if ADLS2 not supported for time being)

desertaxle commented 1 year ago

Thanks for the issue @attekei! We'll see if we can add support for ADLS2 to the Azure Blob Storage steps. If we can't, we can explore creating steps specifically for ADLS2.