Closed renesat closed 4 months ago
Thanks for the PR. What exactly is the issue you are seeing?
FWIW, for me, both of these statements work (although they do different things).
EDIT: The above repro on a local lakeFS quickstart instance, slightly edited (requires a .lakectl.yaml
):
import duckdb
import fsspec
import lakefs
lakefs.Repository("quickstart").create("local://quickstart", include_samples=True, exist_ok=True)
# Working
res_a = fsspec.filesystem("lakefs").glob("lakefs://quickstart/main/*.parquet")
print(res_a)
# Not working
duckdb.register_filesystem(fsspec.filesystem("lakefs"))
res_b = duckdb.sql("FROM 'lakefs://quickstart/main/*.parquet'")
print(res_b)
@nicholasjng this is more complex example with sample repo:
import duckdb
import fsspec
import lakefs
lakefs.Repository("quickstart").create("local://quickstart", include_samples=True, exist_ok=True)
fs = fsspec.filesystem("lakefs")
fs.mkdir("lakefs://quickstart/main/test")
fs.cp("lakefs://quickstart/main/lakes.parquet", "lakefs://quickstart/main/test/lakes.parquet")
fs.cp("lakefs://quickstart/main/lakes.parquet", "lakefs://quickstart/main/test/lakes2.parquet")
fs.rm("lakefs://quickstart/main/README.md")
fs.rm("lakefs://quickstart/main/data/", recursive=True)
fs.rm("lakefs://quickstart/main/images/", recursive=True)
fs.rm("lakefs://quickstart/main/lakes.parquet")
# Bug 1: return empty list but have "test" dir
try:
test1 = fsspec.filesystem("lakefs").ls("lakefs://quickstart/main/")
print(test1)
assert test1 != []
except:
print("test1 not working")
# Bug 2: return empty list but have "test" dir
# Not working
try:
test2 = fsspec.filesystem("lakefs").glob("lakefs://quickstart/main/test/*.parquet")
print(test2)
assert test2 != []
except:
print("test2 not working")
duckdb.register_filesystem(fsspec.filesystem("lakefs"))
# Not working
try:
test3 = duckdb.sql("FROM read_parquet('lakefs://quickstart/main/test/*.parquet')")
print(test3)
except:
print("test3 not working")
# Working
test4 = duckdb.sql("FROM read_parquet(['lakefs://quickstart/main/test/lakes2.parquet', 'lakefs://quickstart/main/test/lakes2.parquet'])")
print(test4)
I create repo structure like this:
lakefs://
└── quickstart/
└── main/
└── test/
├── lakes.parquet
└── lakes2.parquet
I think bag 2 related with bag 1.
Sorry for the delay, and thanks for that new repro. I ran this locally and was able to reproduce test case 1. We should therefore add it as a regression test. Would you like to do that?
Otherwise, looks good!
The test failure happens because the path
is not stripped of its leading protocol in test_ls_on_commit
.
Changing L488 in fs.ls()
to path = self._strip_protocol(path)
solves the issue.
@nicholasjng I add test and find out why I have problem with line 430. It's because of incorrect cache update for directory with only directories in it. I fix it by removing "/" from the end of dirs names.
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
1586b02
) 94.24% compared to head (e76dc9f
) 94.24%. Report is 2 commits behind head on main.:exclamation: Current head e76dc9f differs from pull request most recent head 3ed1f7a. Consider uploading reports for the commit 3ed1f7a to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@AdrianoKF Done👌
Thank you!
I found it when working with DuckDB. To reproduce the second bug, you can try calling this: