Closed djouallah closed 8 months ago
Pulling the example out of the notebook:
table = catalog.create_table(
"default.taxi_dataset",
schema=df.schema,location="abfss://onelakene.dfs.core.windows.net/aemo/iceberg"
)
error:
WARNING:pyiceberg.io:Could not initialize FileIO: pyiceberg.io.fsspec.FsspecFileIO
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-f052e36c7671> in <cell line: 1>()
----> 1 table = catalog.create_table(
2 "default.taxi_dataset",
3 schema=df.schema,location="abfss://onelakene.dfs.core.windows.net/aemo/iceberg"
4 )
3 frames
/usr/local/lib/python3.10/dist-packages/pyiceberg/io/pyarrow.py in _initialize_fs(self, scheme, netloc)
390 return PyArrowLocalFileSystem()
391 else:
--> 392 raise ValueError(f"Unrecognized filesystem type in URI: {scheme}")
393
394 def new_input(self, location: str) -> PyArrowFile:
ValueError: Unrecognized filesystem type in URI: abfss
abfss
isn't currently supported in pyarrow FS implementation
https://github.com/apache/iceberg-python/blob/7f712fdad025a2110816ec217616de54631f1e3e/pyiceberg/io/pyarrow.py#L339-L393
but it is available in the fsspec implementation https://github.com/apache/iceberg-python/blob/7f712fdad025a2110816ec217616de54631f1e3e/pyiceberg/io/fsspec.py#L181-L182
Looks like pyarrow can support "fsspec-compatible filesystems" like Azure Blob Storage (abfs
/abfss
)
https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow
There's an issue open to make fsspec and pyarrow filesystems cross-compatible #310
In the meantime, I think you might be able to workaround this by explicitly using fsspec. You'd have to set this in the catalog properties setting
catalog = SqlCatalog(
"default",
**{
"uri": "sqlite:///:memory:",
"adlfs.account-name": userdata.get("account_name") ,
"adlfs.account-key": userdata.get ("AZURE_STORAGE_ACCOUNT_KEY"),
"adlfs.tenant-id" : userdata.get("azure_storage_tenant_id"),
"py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO"
},
)
getting this error now
319 return file_io
320 else:
--> 321 raise ValueError(f"Could not initialize FileIO: {io_impl}")
322
323 # Check the table location
ValueError: Could not initialize FileIO: pyiceberg.io.fsspec.FsspecFileIO
oh interesting, this is because of a dependency issue.
The actual error shows up when you try to import that class
from pyiceberg.io.fsspec import FsspecFileIO
fsspec
has a dependency on botocore
and botocore
is not installed with !pip install -q pyiceberg[adlfs]
https://github.com/apache/iceberg-python/blob/7f712fdad025a2110816ec217616de54631f1e3e/pyiceberg/io/fsspec.py#L33-L34
To resolve this, install botocore
!pip install botocore
In the future, we'll get rid of this dependency issue. This is caught by deptry
in #528.
Thanks, it works now
Apache Iceberg version
None
Please describe the bug 🐞
reproducible example https://colab.research.google.com/drive/1EjffJO75-8Rj4V0MGKUsoFHDOGgicKgK#scrollTo=8WRyLlmyXnXu