eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data
https://rikai.readthedocs.io/en/latest/
Apache License 2.0
137 stars 19 forks source link

Torch is required even when we are using Tensorflow #539

Closed da-liii closed 2 years ago

da-liii commented 2 years ago

See https://github.com/eto-ai/rikai/runs/5199485143?check_suite_focus=true from #538

E                   : java.lang.RuntimeException: 2022-02-15 11:52:34.988360: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.10/x64/lib2022-02-15 11:52:34.988399: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.2022-02-15 11:52:37,188 INFO Rikai (tfhub_registry.py:65): Resolving model tfssd from tfhub:///tensorflow/ssd_mobilenet_v2/2/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/spark/sql/model.py:270: UserWarning: Using schema and pre_processing/post_processing explicitlyis deprecated and will be removed in Rikai 0.2. Please migrate to a concrete ModelType.  warnings.warn(2022-02-15 11:52:37,215 ERROR Rikai (base.py:90): Unsupported model flavor: tensorflowTraceback (most recent call last):  File "<string>", line 6, in <module>  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/spark/sql/codegen/base.py", line 97, in command_from_spec    return registry.resolve(row_spec)  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/experimental/tfhub/tfhub_registry.py", line 76, in resolve    return udf_from_spec(spec)  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/spark/sql/codegen/base.py", line 82, in udf_from_spec    codegen = importlib.import_module(codegen_module)  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/importlib/__init__.py", line 127, in import_module    return _bootstrap._gcd_import(name[level:], package, level)  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked  File "<frozen importlib._bootstrap_external>", 
line 850, in exec_module  File "<frozen importlib._bootstrap>", 
line 228, in _call_with_frames_removed  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/spark/sql/codegen/tensorflow.py", 
line 23, in <module>    from rikai.pytorch.pandas import PandasDataset  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/pytorch/__init__.py", 
line 18, in <module>    from rikai.pytorch.data import Dataset  # noqa: F401  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/rikai/pytorch/data.py", 
line 23, in <module>    import torchModuleNotFoundError: No module named 'torch'
da-liii commented 2 years ago

The tensorflow codegen should use the tensorflow PandasDataset instead of the torch PandasDataset in #507

Tensorflow Dataset: https://www.tensorflow.org/api_docs/python/tf/data/Dataset

da-liii commented 2 years ago

Here is the related issue: #40

da-liii commented 2 years ago

closed by https://github.com/eto-ai/rikai/issues/574