Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
583 stars 250 forks source link

Can't run Kubeflow sample code #47

Closed miramar-labs closed 3 years ago

miramar-labs commented 3 years ago

I've tried a few different versions of tfx/tensorflow/kfp/python and consistently get the following error:

root@b8ad67b54428:~/building-machine-learning-pipelines/pipelines/kubeflow_pipelines# export PYTHONPATH=~/building-machine-learning-pipelines/pipelines
root@b8ad67b54428:~/building-machine-learning-pipelines/pipelines/kubeflow_pipelines# python pipeline_kubeflow.py 
2021-02-21 21:27:38.699832: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "pipeline_kubeflow.py", line 10, in <module>
    from tfx.orchestration import pipeline
  File "/usr/local/lib/python3.6/dist-packages/tfx/orchestration/pipeline.py", line 28, in <module>
    from tfx.dsl.components.base import base_node
  File "/usr/local/lib/python3.6/dist-packages/tfx/dsl/components/base/base_node.py", line 28, in <module>
    from tfx.dsl.components.base import base_executor
  File "/usr/local/lib/python3.6/dist-packages/tfx/dsl/components/base/base_executor.py", line 40, in <module>
    beam_Pipeline = beam.Pipeline
AttributeError: module 'apache_beam' has no attribute 'Pipeline'
hanneshapke commented 3 years ago

@miramar-labs can you please let us know which Beam and TFX version was installed?

miramar-labs commented 3 years ago

I used virtualenv to create a python 3.6.13 virtual environment, then pip install tfx kfp which resulted in: apache-beam 2.27.0 kfp 1.4.0 kfp-pipeline-spec 0.1.6 kfp-server-api 1.3.0 tensorboard 2.4.1 tensorboard-plugin-wit 1.8.0 tensorflow 2.4.1 tensorflow-cloud 0.1.13 tensorflow-data-validation 0.27.0 tensorflow-datasets 3.0.0 tensorflow-estimator 2.4.0 tensorflow-hub 0.9.0 tensorflow-metadata 0.27.0 tensorflow-model-analysis 0.27.0 tensorflow-serving-api 2.4.1 tensorflow-transform 0.27.0 tfx 0.27.0 tfx-bsl 0.27.1

miramar-labs commented 3 years ago

I also tried your requirements.txt in a clean python 3.6 virtual environment .. but that seemed to send pip off into a never ending search for dependencies... would love to know what the magic combination of versions of things are...

miramar-labs commented 3 years ago

Ok so I think the problem for me was how I was getting PyCharm to find your 'pipelines' module (I was appending the pipelines folder to PYTHONPATH) ... so I wrote a setup.py to properly install it locally and got past the error :

setup.py (copy this to book root folder):

from setuptools import setup

setup(name='pipelines',
      version='1.0',
      description='Utility Pipeline Code',
      url='https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/tree/master/pipelines',
      author='Hannes Hapke',
      author_email='buildingmlpipelines@gmail.com',
      license='MIT',
      packages=['pipelines'],
      zip_safe=False)

then, from same folder as setup.py:

pip install -e .

Environment: Python 3.8.0 virtual env:

apache-beam 2.28.0 kfp 1.4.0 kfp-pipeline-spec 0.1.6 kfp-server-api 1.3.0 pip 21.0.1 pipelines 1.0 /home/aaron/building-machine-learning-pipelines tensorboard 2.4.1 tensorboard-plugin-wit 1.8.0 tensorflow 2.4.1 tensorflow-cloud 0.1.13 tensorflow-data-validation 0.27.0 tensorflow-datasets 3.0.0 tensorflow-estimator 2.4.0 tensorflow-hub 0.9.0 tensorflow-metadata 0.27.0 tensorflow-model-analysis 0.27.0 tensorflow-serving-api 2.4.1 tensorflow-transform 0.27.0 tfx 0.27.0 tfx-bsl 0.27.1