When using daft.read_hudi on Windows with an S3 URI, a FileNotFoundError is thrown. This is likely due to the use of backslashes ("\") in the os.path.join with S3 URI on Windows.
FileNotFoundError: File: s3://bucket/test\year=2024/month=05/day=17/24ebd153-6cf9-425f-88e9-91ca243bf973-0_2-41-1195_20240523052231633.parquet not found
FileNotFoundError: File: s3://bucket/test\.hoodie\hoodie.properties not found
Expected behavior
The daft.read_hudi function should be able to read from the specified S3 URI without throwing a FileNotFoundError.
Environment
OS: Windows 11
Daft version: 0.2.24
Python version: 3.10
Additional context
This issue does not occur on Linux or macOS, as these systems use forward slashes ("/") in file paths. A potential fix for this issue could be to always use forward slashes when constructing S3 URIs, regardless of the operating system.
Describe the bug
When using
daft.read_hudi
on Windows with an S3 URI, aFileNotFoundError
is thrown. This is likely due to the use of backslashes ("\") in theos.path.join
with S3 URI on Windows.To Reproduce
Steps to reproduce the behavior:
Expected behavior The
daft.read_hudi
function should be able to read from the specified S3 URI without throwing a FileNotFoundError.Environment
OS: Windows 11 Daft version: 0.2.24 Python version: 3.10
Additional context
This issue does not occur on Linux or macOS, as these systems use forward slashes ("/") in file paths. A potential fix for this issue could be to always use forward slashes when constructing S3 URIs, regardless of the operating system.
Recommend library:
pathlib
.