delta-incubator / deltaray

Delta reader for the Ray open-source toolkit for building ML applications
Apache License 2.0
43 stars 11 forks source link

Typing bug when importing deltaray #16

Open srggrs opened 1 year ago

srggrs commented 1 year ago

I got this bug when importing deltaray

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 import deltaray

File /path/to/my/py-venv/lib/python3.9/site-packages/deltaray/__init__.py:1
----> 1 from .data import read_delta
      3 __all__ = ["read_delta"]

File /path/to/my/py-venv/lib/python3.9/site-packages/deltaray/data/__init__.py:1
----> 1 from .read_api import read_delta
      3 __all__ = ["read_delta"]

File /path/to/my/py-venv/lib/python3.9/site-packages/deltaray/data/read_api.py:31
     13 from ray.data._internal.arrow_block import ArrowRow
     15 import numpy as np
     18 def read_delta(
     19     table_uri: str,
     20     *,
     21     version: Optional[str] = None,
     22     storage_options: Optional[Dict[str, str]] = None,
     23     without_files: bool = False,
     24     filesystem: Optional["pyarrow.fs.FileSystem"] = None,
     25     columns: Optional[List[str]] = None,
     26     parallelism: int = -1,
     27     ray_remote_args: Dict[str, Any] = None,
     28     tensor_column_schema: Optional[Dict[str, Tuple[np.dtype, Tuple[int, ...]]]] = None,
     29     meta_provider=DefaultParquetMetadataProvider(),
     30     **arrow_parquet_args,
---> 31 ) -> Dataset[ArrowRow]:
     32     """Create an Arrow dataset from a Delta Table using Ray
     33 
     34     Examples:
   (...)
     60         Dataset holding Arrow records read from the Delta Lake Table
     61     """
     62     dt = DeltaTable(table_uri, version, storage_options, without_files)

TypeError: 'type' object is not subscriptable

Step to reproduce

  1. Have a Python 3.9 venv
  2. install deltaray 0.2.0
  3. import delta ray in a python console
pip install deltaray
Collecting deltaray
  Downloading deltaray-0.2.0-py3-none-any.whl (7.7 kB)
Collecting deltalake>=0.7.0 (from deltaray)
  Downloading deltalake-0.13.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.0 kB)
Collecting ray>=2.2.0 (from ray[data]>=2.2.0->deltaray)
  Downloading ray-2.8.0-cp39-cp39-manylinux2014_x86_64.whl.metadata (13 kB)
Requirement already satisfied: numpy>=1.24.1 in /path/to/my/py-venv/lib/python3.9/site-packages (from deltaray) (1.25.2)
Requirement already satisfied: pyarrow>=8 in /path/to/my/py-venv/lib/python3.9/site-packages (from deltalake>=0.7.0->deltaray) (13.0.0)
Requirement already satisfied: click>=7.0 in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (8.1.7)
Collecting filelock (from ray>=2.2.0->ray[data]>=2.2.0->deltaray)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: jsonschema in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (4.19.0)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (1.0.5)
Requirement already satisfied: packaging in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (23.1)
Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (4.24.3)
Requirement already satisfied: pyyaml in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (6.0.1)
Collecting aiosignal (from ray>=2.2.0->ray[data]>=2.2.0->deltaray)
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting frozenlist (from ray>=2.2.0->ray[data]>=2.2.0->deltaray)
  Downloading frozenlist-1.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Requirement already satisfied: requests in /path/to/my/py-venv/lib/python3.9/site-packages (from ray>=2.2.0->ray[data]>=2.2.0->deltaray) (2.28.2)
Requirement already satisfied: pandas>=1.3 in /path/to/my/py-venv/lib/python3.9/site-packages (from ray[data]>=2.2.0->deltaray) (1.5.3)
Requirement already satisfied: fsspec in /path/to/my/py-venv/lib/python3.9/site-packages (from ray[data]>=2.2.0->deltaray) (2023.9.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /path/to/my/py-venv/lib/python3.9/site-packages (from pandas>=1.3->ray[data]>=2.2.0->deltaray) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /path/to/my/py-venv/lib/python3.9/site-packages (from pandas>=1.3->ray[data]>=2.2.0->deltaray) (2023.3.post1)
Requirement already satisfied: attrs>=22.2.0 in /path/to/my/py-venv/lib/python3.9/site-packages (from jsonschema->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (23.1.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /path/to/my/py-venv/lib/python3.9/site-packages (from jsonschema->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (2023.7.1)
Requirement already satisfied: referencing>=0.28.4 in /path/to/my/py-venv/lib/python3.9/site-packages (from jsonschema->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (0.30.2)
Requirement already satisfied: rpds-py>=0.7.1 in /path/to/my/py-venv/lib/python3.9/site-packages (from jsonschema->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (0.10.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /path/to/my/py-venv/lib/python3.9/site-packages (from requests->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /path/to/my/py-venv/lib/python3.9/site-packages (from requests->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /path/to/my/py-venv/lib/python3.9/site-packages (from requests->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /path/to/my/py-venv/lib/python3.9/site-packages (from requests->ray>=2.2.0->ray[data]>=2.2.0->deltaray) (2023.7.22)
Requirement already satisfied: six>=1.5 in /path/to/my/py-venv/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas>=1.3->ray[data]>=2.2.0->deltaray) (1.16.0)
Downloading deltalake-0.13.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.6/22.6 MB 4.3 MB/s eta 0:00:00
Downloading ray-2.8.0-cp39-cp39-manylinux2014_x86_64.whl (62.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.5/62.5 MB 3.6 MB/s eta 0:00:00
Downloading frozenlist-1.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (228 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.0/228.0 kB 5.2 MB/s eta 0:00:00
Downloading filelock-3.13.1-py3-none-any.whl (11 kB)
Installing collected packages: frozenlist, filelock, deltalake, aiosignal, ray, deltaray
Successfully installed aiosignal-1.3.1 deltalake-0.13.0 deltaray-0.2.0 filelock-3.13.1 frozenlist-1.4.0 ray-2.8.0
(py-venv) user@pcname:/some/path/in/my/laptop$ ipython
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import deltaray
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 import deltaray

File /path/to/my/py-venv/lib/python3.9/site-packages/deltaray/__init__.py:1
----> 1 from .data import read_delta
      3 __all__ = ["read_delta"]

File /path/to/my/py-venv/lib/python3.9/site-packages/deltaray/data/__init__.py:1
----> 1 from .read_api import read_delta
      3 __all__ = ["read_delta"]

File /path/to/my/py-venv/lib/python3.9/site-packages/deltaray/data/read_api.py:31
     13 from ray.data._internal.arrow_block import ArrowRow
     15 import numpy as np
     18 def read_delta(
     19     table_uri: str,
     20     *,
     21     version: Optional[str] = None,
     22     storage_options: Optional[Dict[str, str]] = None,
     23     without_files: bool = False,
     24     filesystem: Optional["pyarrow.fs.FileSystem"] = None,
     25     columns: Optional[List[str]] = None,
     26     parallelism: int = -1,
     27     ray_remote_args: Dict[str, Any] = None,
     28     tensor_column_schema: Optional[Dict[str, Tuple[np.dtype, Tuple[int, ...]]]] = None,
     29     meta_provider=DefaultParquetMetadataProvider(),
     30     **arrow_parquet_args,
---> 31 ) -> Dataset[ArrowRow]:
     32     """Create an Arrow dataset from a Delta Table using Ray
     33 
     34     Examples:
   (...)
     60         Dataset holding Arrow records read from the Delta Lake Table
     61     """
     62     dt = DeltaTable(table_uri, version, storage_options, without_files)

TypeError: 'type' object is not subscriptable

In [2]: 
dongsupkim-onepredict commented 11 months ago

same here python 3.11.5 deltaray==0.2.0

type 'Dataset' is not subscriptable File "/home/pdx/workspace/notebooks/delta-lake-test.py", line 5, in <module> import deltaray TypeError: type 'Dataset' is not subscriptable

pushkarparanjpe commented 10 months ago

+1 python 3.10 deltaray 0.2.0

pushkarparanjpe commented 10 months ago

Following a PR that fixes this bug. Please merge!

Pull Request

cc: @dennyglee @JHibbard @dongsupkim-onepredict @srggrs