Closed PeterFogh closed 10 months ago
CC @daavoo in case it is related to https://github.com/fsspec/adlfs/pull/383
@PeterFogh , what is PATH
in the snippet?
@daavoo - I have tried both direct paths like "container/blob" and protocal paths link "az://container/blob" - but both fail the same way.
Hi @PeterFogh , I am unable to reproduce. Does the dir have any particular structure? Do you call rm
with multiple args?
Apart from the existing recursive
test (https://github.com/fsspec/adlfs/blob/main/adlfs/tests/test_spec.py#L724) which passes locally and on CI, I have tried a quick script locally (trying different ways of removing subdirs) on a fresh venv
and actual bucket:
@daavoo - I still get the error, but I can see that we differ in fsspec version. because mine is fsspec 2022.11.0 py310haa95532_0
, which do not match with the adlfs
version 2023.1.0
After I forced the fsspec version to conda install fsspec=2023.1.0
the code can delete the folder recursively without the error :)
Rigth now, I'm solving a new conda environment to see if the versions are compatiable without any versions specifications.
name: py310_readings_4
channels:
- defaults
- conda-forge
dependencies:
- distributed
- dask
- adlfs
- ipykernel
- matplotlib
- python=3.10
- pyarrow
- pandas
Still solves to missmatching adlfs and fsspec verisons:
$ conda list | grep -E "adlfs|fsspec|dask"
adlfs 2023.1.0 pyhd8ed1ab_0 conda-forge
dask 2022.7.0 py310haa95532_0
dask-core 2022.7.0 py310haa95532_0
fsspec 2022.11.0 py310haa95532_0
Rigth now, I'm solving a new conda environment to see if the versions are compatiable without any versions specifications.
I think the problem might be in the defaults
channel of conda.
Changing your file to use only conda-forge
channel works for me:
$ conda list | grep -E "adlfs|fsspec|dask"
adlfs 2023.1.0 pyhd8ed1ab_0 conda-forge
dask 2023.1.0 pyhd8ed1ab_0 conda-forge
dask-core 2023.1.0 pyhd8ed1ab_0 conda-forge
fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge
Hello. I am having the exact same issue. What is the proper mitigation procedure please?
Hello. I am having the exact same issue. What is the proper mitigation procedure please?
Hi @igorng , how did you install adlfs
? via conda?
@daavoo adlfs is installed as part of transitive dep of a package, which is installed wit pip
@daavoo adlfs is installed as part of transitive dep of a package, which is installed wit pip
Could you share pip list
? Or just check the fsspec
version?
As in my comment , it works for me with latest adlfs and fsspec versions, there might be some package in the dependency list that is causing you to install an older version of fsspec
For some reason, when I do not put a constraint on the version to use, or when I set it to the latest 2023.1.0, fsspec seems to be taken from conda-forge, while adlfs comes from pypi, and it does not work
$ conda list | grep -E 'adlfs|fsspec'
adlfs 2023.1.0 pypi_0 pypi
...
fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge
When I set version to 2022.11.0, both are from pypi, and it works.
$ conda list | grep -E "adlfs|fsspec"
adlfs 2022.11.0 pypi_0 pypi
fsspec 2022.11.0 pypi_0 pypi
So I downgraded to 2022.11.0 .
Edit: I don't have time to investigate why fsspec 2023.1.0 is coming from conda-forge (complex env here), os since 2022.11.0 works, fine by me ;)
Thank you!
I have the same problem with latest adlfs and fsspec. In the Azure container it is maybe 4 folders deep, with parquet files in the last folder. But it does not always throw an error, it depends on the folder structure.
For example, first time I run this code it only succeeds to delete 2 out of 3 folders with similar depths
Next time I run this code (to try and delete the remaining folder I get this error.
fs = LakeHouse.Install_Base.fs
folders = fs.glob(CONTAINER_INSTALL_BASE_HERCULES + "/*")
print("folders before delete")
print(folders)
fs.rm(folders, recursive=True)
folders = fs.glob(CONTAINER_INSTALL_BASE_HERCULES + "/*")
print("folders left after delete")
print(folders)
folders before delete
['install-base-from-hercules-standard/vehicle_type=536']
Traceback (most recent call last):
File "C:\temp\Apps\Anaconda64\lib\site-packages\adlfs\spec.py", line 1252, in _rm
await self._rm_files(container_name, files)
File "C:\temp\Apps\Anaconda64\lib\site-packages\adlfs\spec.py", line 1281, in _rm_files
raise ex
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\storage\blob\aio\_container_client_async.py", line 972, in delete_blob
await blob.delete_blob( # type: ignore
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\storage\blob\aio\_blob_client_async.py", line 600, in delete_blob
process_storage_error(error)
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\storage\blob\_shared\response_handlers.py", line 185, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "<string>", line 1, in <module>
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\storage\blob\aio\_blob_client_async.py", line 598, in delete_blob
await self._client.blob.delete(**options)
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer
return await func(*args, **kwargs)
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\storage\blob\_generated\aio\operations\_blob_operations.py", line 685, in delete
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "C:\temp\Apps\Anaconda64\lib\site-packages\azure\core\exceptions.py", line 110, in map_error
raise error
ResourceExistsError: This operation is not permitted on a non-empty directory.
RequestId:ece2ab15-001e-0002-532b-3747c0000000
Time:2023-02-02T17:23:17.7579894Z
ErrorCode:DirectoryIsNotEmpty
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>DirectoryIsNotEmpty</Code><Message>This operation is not permitted on a non-empty directory.
RequestId:ece2ab15-001e-0002-532b-3747c0000000
Time:2023-02-02T17:23:17.7579894Z</Message></Error>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\nosterlu\AppData\Local\Temp\ipykernel_27744\3000828452.py", line 5, in <cell line: 5>
fs.rm(folders, recursive=True)
File "C:\temp\Apps\Anaconda64\lib\site-packages\fsspec\asyn.py", line 114, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "C:\temp\Apps\Anaconda64\lib\site-packages\fsspec\asyn.py", line 99, in sync
raise return_result
File "C:\temp\Apps\Anaconda64\lib\site-packages\fsspec\asyn.py", line 54, in _runner
result[0] = await coro
File "C:\temp\Apps\Anaconda64\lib\site-packages\adlfs\spec.py", line 1258, in _rm
raise RuntimeError("Failed to remove %s for %s", path, e)
RuntimeError: ('Failed to remove %s for %s', ['install-base-from-hercules-standard/vehicle_type=534', 'install-base-from-hercules-standard/vehicle_type=536', 'install-base-from-hercules-standard/vehicle_type=536/', 'install-base-from-hercules-standard/vehicle_type=536/model_year=2024', 'install-base-from-hercules-standard/vehicle_type=539', 'install-base-from-hercules-standard/vehicle_type=539/', 'install-base-from-hercules-standard/vehicle_type=539/vehicle_type=536', 'install-base-from-hercules-standard/vehicle_type=539/vehicle_type=539'], ResourceExistsError('This operation is not permitted on a non-empty directory.\nRequestId:ece2ab15-001e-0002-532b-3747c0000000\nTime:2023-02-02T17:23:17.7579894Z\nErrorCode:DirectoryIsNotEmpty'))
This is my current setup with pip list. (I have only installed via pip)
APScheduler 3.8.1
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
args 0.1.0
arrow 1.2.3
art 5.3
arviz 0.11.2
asn1crypto 1.5.1
astroid 2.12.10
astropy 5.1
asttokens 2.0.5
async-timeout 4.0.1
atomicwrites 1.4.1
atpublic 3.1.1
attrs 22.1.0
Authlib 1.2.0
Automat 20.2.0
autopep8 1.6.0
azure-batch 13.0.0
azure-common 1.1.27
azure-core 1.26.2
azure-datalake-store 0.0.52
azure-functions 1.12.0
azure-identity 1.12.0
azure-keyvault 4.2.0
azure-keyvault-certificates 4.6.0
azure-keyvault-keys 4.7.0
azure-keyvault-secrets 4.6.0
azure-nspkg 3.0.2
azure-storage 0.36.0
azure-storage-blob 12.14.1
azure-storage-file-datalake 12.9.1
Babel 2.10.3
backcall 0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile 1.0
backports.weakref 1.0.post1
bcrypt 4.0.0
beautifulsoup4 4.11.1
binaryornot 0.4.4
bitarray 2.5.1
bkcharts 0.2
black 22.8.0
bleach 5.0.1
bokeh 2.4.3
boto3 1.24.28
botocore 1.27.28
Bottleneck 1.3.5
brotlipy 0.7.0
cachetools 4.2.4
catboost 1.0.3
certifi 2022.9.24
cffi 1.15.1
cftime 1.5.1.1
chardet 5.0.0
charset-normalizer 2.1.1
click 8.1.3
clint 0.5.1
cloudpickle 2.2.0
clyent 1.2.2
cmdstanpy 0.9.68
colorama 0.4.5
colorcet 3.0.0
commonmark 0.9.1
comtypes 1.1.10
conda 22.9.0
conda-build 3.22.0
conda-content-trust 0.1.3
conda-pack 0.6.0
conda-package-handling 1.8.1
conda-repo-cli 1.0.5
conda-token 0.3.0
conda-verify 3.4.2
constantly 15.1.0
convertdate 2.3.2
cookiecutter 2.1.1
coverage 6.4.4
cramjam 2.5.0
cryptography 36.0.2
cssselect 1.1.0
cycler 0.11.0
Cython 0.29.30
cytoolz 0.11.0
daal4py 2021.5.0
dask 2023.1.0
dataprep 0.4.5
datashader 0.14.1
datashape 0.5.4
dateutils 0.6.12
debugpy 1.6.3
decorator 5.1.1
defusedxml 0.7.1
Deprecated 1.2.13
diff-match-patch 20200713
dill 0.3.5.1
distributed 2023.1.0
docstring-parser 0.13
docutils 0.19
duckdb 0.6.1
duckdb-engine 0.6.4
entrypoints 0.4
ephem 4.1.2
et-xmlfile 1.1.0
executing 0.8.3
fastjsonschema 2.16.2
fastparquet 0.8.3
filelock 3.6.0
fire 0.4.0
flake8 4.0.1
Flask 2.2.2
Flask-Cors 3.0.10
Flask-Login 0.6.2
Flask-OAuth 0.12
Flask-SQLAlchemy 2.5.1
fonttools 4.25.0
frozenlist 1.2.0
fsspec 2023.1.0
future 0.18.2
gensim 4.1.2
geographiclib 1.52
geopy 2.2.0
glob2 0.7
google-api-core 2.2.2
google-api-python-client 1.12.8
google-auth 2.6.0
google-auth-httplib2 0.1.0
google-cloud-core 2.2.2
google-cloud-storage 1.43.0
google-crc32c 1.3.0
google-resumable-media 2.1.0
googleapis-common-protos 1.58.0
googlemaps 4.6.0
graphviz 0.19.1
greenlet 1.1.1
grpcio 1.42.0
h11 0.13.0
h5py 3.7.0
HeapDict 1.0.1
hijri-converter 2.2.2
holidays 0.13
holoviews 1.15.0
html5lib 1.1
httplib2 0.20.2
hvplot 0.8.0
hyperlink 21.0.0
ibis 3.2.0
ibis-framework 3.2.0
idna 3.4
imagecodecs 2021.8.26
imageio 2.19.3
imagesize 1.4.1
importlib-metadata 4.12.0
incremental 21.3.0
inflection 0.5.1
iniconfig 1.1.1
intake 0.6.5
intervaltree 3.1.0
ipykernel 6.16.0
ipython 7.34.0
ipython-genutils 0.2.0
ipywidgets 7.6.5
isodate 0.6.0
isort 5.10.1
itemadapter 0.3.0
itemloaders 1.0.4
itsdangerous 2.1.2
jaraco.classes 3.2.3
jdcal 1.4.1
jedi 0.18.1
jellyfish 0.9.0
Jinja2 3.0.3
jinja2-time 0.2.0
jmespath 0.10.0
joblib 1.1.0
json5 0.9.6
jsonpath-ng 1.5.3
jsonschema 4.16.0
jupyter 1.0.0
jupyter_client 7.3.5
jupyter-console 6.4.3
jupyter-core 4.11.1
jupyter-server 1.18.1
jupyterlab 3.4.4
jupyterlab-pygments 0.2.2
jupyterlab-server 2.10.3
jupyterlab-widgets 1.0.0
keyring 23.9.3
kfp 1.8.10
kfp-pipeline-spec 0.1.13
kfp-server-api 1.7.1
kiwisolver 1.4.2
korean-lunar-calendar 0.2.1
kubernetes 18.20.0
lazy-object-proxy 1.7.1
libarchive-c 2.9
libmambapy 0.25.0
lightgbm 3.3.3
line-profiler 3.5.1
llvmlite 0.39.1
locket 1.0.0
LunarCalendar 0.0.9
lxml 4.9.1
lz4 3.1.3
mamba 0.25.0
Markdown 3.3.4
MarkupSafe 2.1.1
matplotlib 3.5.2
matplotlib-inline 0.1.6
mccabe 0.6.1
menuinst 1.4.18
Metaphone 0.6
mistune 2.0.4
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
mock 4.0.3
more-itertools 8.14.0
mpmath 1.2.1
msal 1.16.0
msal-extensions 0.3.0
msgpack 1.0.4
msrest 0.7.1
msrestazure 0.6.4
multidict 5.2.0
multipledispatch 0.6.0
munkres 1.1.4
mypy-extensions 0.4.3
navigator-updater 0.2.1
nbclassic 0.3.5
nbclient 0.6.8
nbconvert 7.0.0
nbformat 5.6.1
nest-asyncio 1.5.5
netCDF4 1.5.7
networkx 2.8.4
nltk 3.7
nodejs 0.1.1
nose 1.3.7
notebook 6.4.12
npm 0.1.1
numba 0.56.2
numexpr 2.8.3
numpy 1.22.4
numpydoc 1.4.0
O365 2.0.16
oauth2 1.9.0.post1
oauthlib 3.1.1
olefile 0.46
opencensus 0.11.0
opencensus-context 0.1.3
opencensus-ext-azure 1.1.7
openpyxl 3.0.10
optional-django 0.1.0
oscrypto 1.2.1
outcome 1.1.0
p3270 0.1.3
packaging 21.3
pandas 1.4.2
pandocfilters 1.5.0
panel 0.13.1
param 1.12.0
paramiko 2.11.0
parsel 1.6.0
parso 0.8.3
parsy 2.0
partd 1.3.0
pathlib 1.0.1
pathspec 0.10.1
patsy 0.5.2
pep8 1.7.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.4.0
pip 22.3.1
pkginfo 1.8.2
platformdirs 2.5.2
plotly 5.9.0
pluggy 1.0.0
ply 3.11
polars 0.14.25
portalocker 1.7.1
poyo 0.5.0
prometheus-client 0.14.1
prompt-toolkit 3.0.31
prophet 1.0.1
Protego 0.1.16
protobuf 3.20.3
psutil 5.9.2
ptyprocess 0.7.0
pure-eval 0.2.2
py 1.11.0
pyan3 1.1.1
pyarrow 8.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pybind11 2.10.0
pycallgraph2 1.1.3
pycodestyle 2.8.0
pycosat 0.6.3
pycparser 2.21
pycryptodome 3.11.0
pycryptodomex 3.11.0
pyct 0.4.8
pycurl 7.45.1
pydantic 1.10.2
PyDispatcher 2.0.5
pydocstyle 6.1.1
pydot 1.4.2
pyerfa 2.0.0
pyflakes 2.4.0
PyGithub 1.55
Pygments 2.13.0
PyHamcrest 2.0.2
PyJWT 2.4.0
pylint 2.15.3
pyls-spyder 0.4.0
pymannkendall 1.4.2
PyMeeus 0.5.11
PyNaCl 1.5.0
pyodbc 4.0.34
pyOpenSSL 21.0.0
pyparsing 3.0.9
PyQt5 5.15.7
PyQt5-Qt5 5.15.2
PyQt5-sip 12.11.0
PyQtChart 5.12
PyQtWebEngine 5.15.6
PyQtWebEngine-Qt5 5.15.2
pyreadline 2.1
pyrsistent 0.18.1
PySocks 1.7.1
pystan 2.19.1.1
pytest 7.1.2
python-crfsuite 0.9.8
python-dateutil 2.8.2
python-dotenv 0.19.2
python-lsp-black 1.2.1
python-lsp-jsonrpc 1.0.0
python-lsp-server 1.5.0
python-slugify 6.1.2
python-snappy 0.6.0
python-stdnum 1.17
pytoolconfig 1.2.2
pytz 2022.2.1
pytz-deprecation-shim 0.1.0.post0
pyviz-comms 2.0.2
PyWavelets 1.3.0
pywin32 305
pywin32-ctypes 0.2.0
pywinpty 2.0.2
PyYAML 6.0
pyzmq 24.0.1
QDarkStyle 3.0.3
qstylizer 0.2.2
QtAwesome 1.1.1
qtconsole 5.3.2
QtPy 2.2.0
queuelib 1.5.0
rapidfuzz 2.13.2
regex 2021.11.10
requests 2.28.1
requests-file 1.5.1
requests-oauthlib 1.3.0
requests-toolbelt 0.9.1
rich 12.5.1
rope 1.3.0
rsa 4.8
Rtree 1.0.0
ruamel-yaml-conda 0.15.100
s3transfer 0.6.0
scikit-image 0.19.2
scikit-learn 1.1.1
scikit-learn-intelex 2021.20220215.102710
scipy 1.9.3
Scrapy 2.6.2
seaborn 0.11.2
selenium 4.1.3
Send2Trash 1.8.0
service-identity 18.1.0
setuptools 59.8.0
setuptools-git 1.2
sip 4.19.13
six 1.16.0
smart-open 5.2.1
sniffio 1.2.0
snowballstemmer 2.2.0
snowflake 0.0.3
snowflake-connector-python 2.9.0
sortedcollections 2.1.0
sortedcontainers 2.4.0
soupsieve 2.3.2.post1
Sphinx 5.2.2
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.0
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
spyder 5.3.3
spyder-kernels 2.3.3
SQLAlchemy 1.3.24
sqlglot 6.2.6
stack-data 0.2.0
statsmodels 0.13.2
stringcase 1.2.0
strip-hints 0.1.10
style 1.1.0
sympy 1.10.1
tables 3.6.1
tabulate 0.8.10
TBB 0.2
tblib 1.7.0
tenacity 8.0.1
teradatasql 17.10.0.7
teradatasqlalchemy 17.0.0.3
termcolor 1.1.0
terminado 0.13.1
testpath 0.6.0
text-unidecode 1.3
textdistance 4.5.0
threadpoolctl 2.2.0
three-merge 0.1.1
thrift 0.16.0
tifffile 2021.7.2
tinycss 0.4
tinycss2 1.1.1
tldextract 3.2.0
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.5
toolz 0.12.0
tornado 6.2
tqdm 4.64.0
traitlets 5.4.0
trio 0.20.0
trio-websocket 0.9.2
Twisted 22.2.0
twisted-iocpsupport 1.0.2
typed-ast 1.4.3
typer 0.4.0
typing_extensions 4.3.0
tzdata 2021.5
tzlocal 2.1
ujson 5.5.0
Unidecode 1.2.0
update 0.0.1
uritemplate 3.0.1
urllib3 1.26.12
varname 0.8.3
w3lib 1.21.0
waitress 2.1.2
watchdog 2.1.9
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.2.3
Werkzeug 2.2.2
whatthepatch 1.0.2
wheel 0.37.1
widgetsnbextension 3.5.2
win-inet-pton 1.1.0
win-unicode-console 0.5
wincertstore 0.2
wordcloud 1.8.2.2
wrapt 1.14.1
wsproto 1.1.0
xarray 0.20.1
xgboost 1.7.1
xlrd 2.0.1
XlsxWriter 3.0.3
xlwings 0.24.9
xmltodict 0.12.0
yapf 0.32.0
yarl 1.8.1
zict 2.2.0
zipp 3.8.1
zope.interface 5.4.0
Thanks for the details @nosterlu ! I am going to try to reproduce it
Bumping as I'm having the same issue.
calling fs.rm(path, recursive=True)
where path is something like container_name/folder
with 4 files. I get
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/adlfs/spec.py", line 1259, in _rm
await self._rm_files(container_name, files)
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/adlfs/spec.py", line 1288, in _rm_files
raise ex
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/storage/blob/aio/_container_client_async.py", line 1035, in delete_blob
await blob.delete_blob( # type: ignore
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 618, in delete_blob
process_storage_error(error)
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 616, in delete_blob
await self._client.blob.delete(**options)
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 691, in delete
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "/home/cjalmeida/work/myproject/.venv/lib/python3.11/site-packages/azure/core/exceptions.py", line 164, in map_error
raise error
azure.core.exceptions.ResourceExistsError: This operation is not permitted on a non-empty directory.
RequestId:f3042353-701e-0069-7a68-dc7b41000000
Time:2023-09-01T00:08:11.0327332Z
ErrorCode:DirectoryIsNotEmpty
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>DirectoryIsNotEmpty</Code><Message>This operation is not permitted on a non-empty directory.
RequestId:f3042353-701e-0069-7a68-dc7b41000000
Time:2023-09-01T00:08:11.0327332Z</Message></Error>
Versions:
fsspec==2023.6.0
adlfs==2023.8.0
Python==3.10
Installed from pip. Also it only happens sometimes, usually when I try to remove shortly (eg, < 5min) after creating the files
For the record, can't reproduce the bug when on 2022.11.2
Still have the same issue also, been working around it to remove all files manually for the time being 😄
I'm running into the same problem with version 2023.10.0 (adlfs/fsspec). Conda environment with Python 3.11.5 and azure-storage-blob version 12.18.3.
If I repeatedly run the command, it eventually works. It seems to delete some of the nested folders/files each time I run it, but errors before it completes all of them.
I think this problem might only effect storage accounts with hierarchical namespace enabled (ADLS gen2). I can reproduce it with basically any recursive delete on a hierarchical namespace account but not on a flat namespace account.
Is everyone else having issues, also using hierarchical namespace accounts?
This would also explain why it can be reproduced in any integration test because azurite does not support hierarchical namespace. The Azure error is also says "non-empty directory" but flat namespace accounts don't have directories so it would be strange to receive that error on a flat namespace account.
I'm pretty sure https://github.com/fsspec/adlfs/pull/383 is the cause. The commit before this it works correctly but with this commit it fails.
I have not properly understood what this PR does but from the description of this PR I think it makes sense why this is happening:
Group files by container_name and use asyncio.gather to remove the groups.
When using the azure blob client hierarchical namespace directories look a lot like blobs. If we asynchronously delete all the relevant blobs its highly likely that we attempt to delete the directory marker blob before we've finished deleting all of its contents.
I think this problem might only effect storage accounts with hierarchical namespace enabled (ADLS gen2). I can reproduce it with basically any recursive delete on a hierarchical namespace account but not on a flat namespace account.
Is everyone else having issues, also using hierarchical namespace accounts?
This would also explain why it can be reproduced in any integration test because azurite does not support hierarchical namespace. The Azure error is also says "non-empty directory" but flat namespace accounts don't have directories so it would be strange to receive that error on a flat namespace account.
Using hierarchical here!
I think it should be quite straightforward to fix. I'll give it a try
Probably not the neatest solution but I think it works. https://github.com/Tom-Newton/adlfs/pull/1
pip install https://github.com/Tom-Newton/adlfs/archive/aec77b00c1fa7fb5bfbbec88e1c9fac45f133e97.zip
I think ideally we would change the way it does listing so that files
contains file details not just path strings. That would provide a better option for distinguishing files from directories.
Probably not the neatest solution but I think it works. Tom-Newton#1
pip install https://github.com/Tom-Newton/adlfs/archive/aec77b00c1fa7fb5bfbbec88e1c9fac45f133e97.zip
I think ideally we would change the way it does listing so that
files
contains file details not just path strings. That would provide a better option for distinguishing files from directories.
Nice! I played around a little also when trying to understand how your code worked!
This could maybe make it a little bit clearer?
async def _identify_directory_markers(self, files):
"""
Identify the files and directory markers from the given list of files.
A directory marker is identified if another file starts with the marker's name followed by '/'.
"""
files = sorted(set(files)) # Remove duplicates and sort
directory_markers = []
blobs = []
for i, file in enumerate(files):
if i + 1 < len(files) and files[i + 1].startswith(file + "/"):
# If the next file starts with the current file's name followed by '/',
# consider it a directory marker.
directory_markers.append(file)
else:
# Otherwise, it's a regular file/blob.
blobs.append(file)
return blobs, directory_markers
for example for testing
if __name__ == "__main__":
def _identify_directory_markers_test(files):
"""
Identify the files and directory markers from the given list of files.
A directory marker is identified if another file starts with the marker's name
followed by '/'.
"""
files = sorted(set(files)) # Remove duplicates and sort
directory_markers = []
blobs = []
for i, file in enumerate(files):
if i + 1 < len(files) and files[i + 1].startswith(file + "/"):
# If the next file starts with the current file's name followed by '/',
# consider it a directory marker.
directory_markers.append(file)
else:
# Otherwise, it's a regular file/blob.
blobs.append(file)
return blobs, directory_markers
files = [
"ptp_parquets/transports_unmapped.parquet",
"ptp_parquets/transports.parquet",
"ptp_parquets/road_transports.parquet",
"ptp_parquets/ptp_parquets/transports_unmapped.parquet",
"ptp_parquets/ptp_parquets/transports.parquet",
"ptp_parquets/ptp_parquets/road_transports.parquet",
"ptp_parquets/ptp_parquets/prod_prc.parquet",
"ptp_parquets/ptp_parquets/packed_yesterday.parquet",
"ptp_parquets/ptp_parquets/packed_transports.parquet",
"ptp_parquets/ptp_parquets/orders.parquet",
"ptp_parquets/ptp_parquets/lines.parquet",
"ptp_parquets/ptp_parquets/flight_transports.parquet",
"ptp_parquets/ptp_parquets/dist_freight.parquet",
"ptp_parquets/ptp_parquets/container_transports.parquet",
"ptp_parquets/ptp_parquets", # folder
"ptp_parquets/ptp_parquets", # folder
"ptp_parquets/prod_prc.parquet",
"ptp_parquets/packed_yesterday.parquet",
"ptp_parquets/packed_transports.parquet",
"ptp_parquets/orders.parquet",
"ptp_parquets/lines.parquet",
"ptp_parquets/flight_transports.parquet",
"ptp_parquets/dist_freight.parquet",
"ptp_parquets/container_transports", # file with no file ending!
"ptp_parquets", # folder
"ptp_parquets", # folder
]
blobs, directory_markers = _identify_directory_markers_test(files)
print("FILES")
for blob in blobs:
print(blob)
print("\nDIRs")
for d in directory_markers:
print(d)
if all files are directories like this
files = [ "ptp_parquets", "orders", ]
they will all end up as blobs
... but I guess it still will work, since there are no dangling files within them... but maybe there is a better way to identify files vs folders in an azure storage... 😅
@daavoo do you have any thoughts on this? https://github.com/fsspec/adlfs/pull/383 seems to remove the _isfile
calls, but something along those lines is required to support hierarchical namespace storage accounts (ADLS gen2).
How would we feel about just reverting https://github.com/fsspec/adlfs/pull/383? I know it provides a big performance advantage for flat namespace accounts but it also breaks hierarchical namespace (ADLS gen2) accounts.
I think something like what @nosterlu and I described could work but there are probably edge cases to consider.
Hi, my old conda environment:
runs the following without any error
But after updating to these package versions:
it raises this error:
RuntimeError: ('Failed to remove %s for %s', [PATHS], ResourceExistsError('This operation is not permitted on a non-empty directory.\nRequestId:ID\nTime:2023-01-20T09:33:26.1693752Z\nErrorCode:DirectoryIsNotEmpty'))
Bacially, the
recursive=True
do not work as intended.