elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.85k stars 342 forks source link

Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship #1053

Closed reevejd closed 3 years ago

reevejd commented 4 years ago

I deployed kubeflow from using this kfdef (uses manifests from v1.1-branch)

When trying to run a pipeline on an external, multi-user auth cluster I’m getting a modal pop-up with this error:

Screen Shot 2020-11-09 at 5 20 14 PM

The relevant parts of the JupyterLab logs seem to be:

Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got:[]","message":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment.

The full trace is in this gist.

Elyra successfully creates the pipeline, and if I then log in to the kubeflow GUI and manually create an experiment from the pipeline Elyra created, it works as expected.

I notice the same error message in a kubeflow/pipelines issue here but it seems to only apply only to applications using the in-cluster endpoint, which isn’t the case for me.

kevin-bates commented 4 years ago

@reevejd - thank you for opening this issue.

It seems like, at a minimum, we need to provide the capability for a user to specify a namespace value in this case. I think the natural location for this would be in the runtime metadata for the kfp schema. When defined (default is None), the namespace value will be provided to the create_experiment call.

James - I'm not sure how available a multi-user cluster will be. As a result, we may need to rely on your ability to look at things for us. Is that something you could do? If so, it might worth building a version of elyra that hardcodes a namespace value corresponding to your environment, just to see if a) this is a viable approach and b) perhaps shed light on the "next" item that requires focus relative to this configuration. Your help will be greatly appreciated. If you're unable to build such a version, we could probably quickly add support to use an environment variable to accomplish the same for now.

lresende commented 4 years ago

We need to further investigate this, a couple of references here:

@reevejd we will get back to you on this once we are able to reproduce the issue and have more info.

reevejd commented 4 years ago

@kevin-bates Thanks for the suggestion. I built a version of Elyra locally with my namespace hard-coded and passed into the create_experiment call like suggested. Unfortunately this didn't change anything; I still get the same error. Re: cluster availability, I can provide access to the cluster for any IBMer who is willing to troubleshoot. I'm also happy to just try whatever other suggestions you have / provide information about the cluster.

akchinSTC commented 4 years ago

@reevejd - Currently building a system to reproduce the issue, if possible could you let me know which version of kubernetes/openshift you are using and any other custom configurations you may have ..e.g. ldap etc.

reevejd commented 4 years ago

@akchinSTC Thanks. I'm using Kubernetes 1.17 on IBM Cloud. The kfdef I used is here. I'm using dex with static users for basic auth as described in the kubeflow docs.

reevejd commented 3 years ago

@akchinSTC were you able to build a system to reproduce the issue? Please let me know if there's anything I can do to help.

akchinSTC commented 3 years ago

@reevejd - yes, we have a system up atm with the kfdef you supplied and a basic dex config with a dummy user and were able to recreate the issue. Im currently looking at possible workarounds/solutions and will get back to you as soon as I find a solution

akchinSTC commented 3 years ago

@reevejd - I think I have a solution that should get you past the error.

These steps worked for me and hopefully get you past the issue for now. Let me know if you run into any issues. We plan on adding the namespace parameter into the runtime metadata moving forward.

reevejd commented 3 years ago

Hi @akchinSTC, thanks for the update. Unfortunately haven't been able to build from your branch to test it out. Running make clean install gives me a not a valid npm package error because of an underlying typescript error:

src/CodeSnippetWidget.tsx(445,11): error TS2322: Type '{ children: Element; displayName: string; tooltip: any; actionButtons: IMetadataActionButton[]; onExpand: () => void; onMouseDown: (event: any) => void; }' is not assignable to type 'IntrinsicAttributes & IntrinsicClassAttributes<ExpandableComponent> & Readonly<IExpandableComponentProps> & Readonly<...>'.

running make docker-image also errors out, but with a different error message:

#9 8.995 ERROR: Cannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
#9 ERROR: executor failed running [/bin/sh -c pip install --upgrade pip jupyterlab notebook]: runc did not terminate sucessfully
------
 > [4/9] RUN pip install --upgrade pip jupyterlab notebook:
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: executor failed running [/bin/sh -c pip install --upgrade pip jupyterlab notebook]: runc did not terminate sucessfully
make: *** [docker-image] Error 1

If there's an easy fix for the above errors, great, I can do that. Otherwise if you publish a dev docker image I can check if your fix works for me.

reevejd commented 3 years ago

Thanks @akchinSTC! The docker image you provided me does fix the problem so https://github.com/elyra-ai/elyra/pull/1081 will resolve this issue.