delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.14k stars 379 forks source link

OSError : Unable to walk dir: IO error for operation on folder/_delta_log #1952

Open Matthieusalor opened 9 months ago

Matthieusalor commented 9 months ago

Environment

Delta-rs version: python 0.14.0

Environment:


Bug

What happened:

When trying to create a new delta table with the latest python version I'm systematically getting an error

OSError : Generic LocalFilesystem Unable to walk dir: IO error for operation on folder/_delta_log: Success (os error 0)

At this stage, the table folder has been created, the data is here, only the _delta_log folder is missing.

I tried to create a table with the latest rust version 0.16.5 and haven't been able to reproduce.

The python code used to work on the 0.13.0 version

How to reproduce it:

import pandas as pd
import deltalake

df = pd.DataFrame({"A": [1, 2]})
deltalake.write_deltalake("test", df)
ion-elgreco commented 9 months ago

Does it work if you do write_deltalake(engine="rust")?

Matthieusalor commented 9 months ago

No It fails both at write_deltalake_rust and write_deltlake_pyarrow with the same error depending on the engine parameter

ion-elgreco commented 9 months ago

I cannot reproduce this issue in WSL.

rtyler commented 8 months ago

I am also unable to reproduce this on a Linux/amd64 machine:

❯ python
Python 3.11.4 (main, Jun 28 2023, 19:51:46) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import deltalake
>>>
>>> df = pd.DataFrame({"A": [1, 2]})
>>> deltalake.write_deltalake("test", df)
>>>
❯ tree test
test
├── 0-8620cf44-3335-4675-8b64-ef286ce7e677-0.parquet
└── _delta_log
    └── 00000000000000000000.json

2 directories, 2 files

@Matthieusalor can you share more details about your filesystem? I am wondering if there's some nuance about the specific filesystem or architecture of your environment that could be causing this issue?

J2OG commented 5 months ago

@rtyler Im facing exact same issue on Machine Learning Studio.

strawhl commented 3 months ago

I also have the same issue from Azure Machine Learning

empowerNate commented 3 months ago

It's an issue on the shared network drive in Azure ML compute instances which is mounted using CIFS. If you write to the local HDD of the machine (~/localfiles) or /mnt, it works. Unfortunately localfiles is only a few 10s of GB of space and mnt in temporary and gets deleted every time an instance shuts down.

MoonKBRR commented 2 months ago

I also have the same issue from Azure Machine Learning

did you solved your issue ? I've the same problem with a mounted volume in azure file share

masc-it commented 1 month ago

Hey, I am having a similar issue (the target storage is a SAMBA mount) when calling write_deltalake:

OSError: Generic LocalFileSystem error: Unable to copy file from /Volumes/datasets/.../_delta_log/_commit_6851eb42-d982-49a0-9468-b3d92657948c.json.tmp to /Volumes/datasets/.../_delta_log/00000000000000000000.json: Operation not supported (os error 45)
moehmeni commented 1 month ago

Same error. @masc-it , @MoonKBRR Did you find any solution for this?