RedHatQuickCourses / rhods-pipelines

Automation using Data Science Pipelines
https://redhatquickcourses.github.io/rhods-pipelines/
2 stars 6 forks source link

Kubeflow Pipeline fails to be imported or to be ran #17

Open msakho opened 5 months ago

msakho commented 5 months ago

The Execute Pipeline section of the following tutorial[1] is not working as expected. After downloading the pipeline, it appears that in cannot be imported. When I write it directly in a notebook and Click 'Restart Kernel and Run All Cells', it fails with the following errors:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 263
    257 client = TektonClient(
    258     host=kubeflow_endpoint,
    259     existing_token=bearer_token,
    260     ssl_ca_cert='/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt'
    261 )
    262 print(f'Connecting to Data Science Pipelines: {kubeflow_endpoint}....done')
--> 263 result = client.create_run_from_pipeline_func(
    264     offline_scoring_pipeline,
    265     arguments={},
    266     experiment_name='offline-scoring-kfp'
    267 )
    268 print(f'Starting pipeline run with run_id: {result.run_id}')

File /opt/app-root/lib64/python3.9/site-packages/kfp_tekton/_client.py:242, in TektonClient.create_run_from_pipeline_func(self, pipeline_func, arguments, run_name, experiment_name, pipeline_conf, tekton_pipeline_conf, namespace)
    240 # TODO: Check arguments against the pipeline function
    241 pipeline_name = pipeline_func.__name__
--> 242 run_name = run_name or pipeline_name + ' ' + datetime.datetime.now().strftime('%Y-%m-%d %H-%M-%S')
    243 try:
    244     (_, pipeline_package_path) = tempfile.mkstemp(suffix='.zip')
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'

It seems like the pipeline is not submitted to the server since nothing appears. There might be something missing on the instruction.

[1]=https://redhatquickcourses.github.io/rhods-pipelines/rhods-pipelines/1.33/chapter1/kfp.html

strangiato commented 5 months ago

Which version of OpenShift AI are using? 2.8?

Can you run the following command in your terminal where you are executing the python code from and share them here?

pip list | grep kfp

As a quick test, you might also try switching to kfp-tekton 1.5.x

pip install kfp-tekton~=1.5.0

We may have had to revert that back to an older version and may have forgotten to update the instructions.

rsriniva commented 5 months ago

@strangiato @msakho - could this be a symptom of #16 ? If so, can you revert the env var for the TLS cert and check again?

rsriniva commented 5 months ago

@erwangranger - is this issue related to the one you opened about the TLS certs?

rsriniva commented 4 months ago

@adelton - any chance you ran into the same issue? any pointers on how to debug this?

rsriniva commented 4 months ago

@jramcast @rruizher @gjbianco - y'all run into this issue?

rsriniva commented 4 months ago

Testing in notebook terminal:

(app-root) (app-root) python
Python 3.9.16 (main, Sep 12 2023, 00:00:00) 
[GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> datetime.now()
datetime.datetime(2024, 4, 17, 4, 15, 3, 224175)
rsriniva commented 4 months ago

however if i execute the code in the kfp-tekton packaged _client.py file

>>> datetime.datetime.now()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'

same error as reported in the bug.

rsriniva commented 4 months ago

I am disabling and commeting out this exercise in the pipelines course until we get a proper solution. It looks like we have an API compat problem with upstream. Upgrading kfp-tekton version to fix this bug will result in other breakages in working labs.

jramcast commented 4 months ago

This is probably related with a problem in the datetime import, which was fixed last December: