Closed reevejd closed 3 years ago
@reevejd - thank you for opening this issue.
It seems like, at a minimum, we need to provide the capability for a user to specify a namespace
value in this case. I think the natural location for this would be in the runtime metadata for the kfp
schema. When defined (default is None), the namespace value will be provided to the create_experiment
call.
James - I'm not sure how available a multi-user cluster will be. As a result, we may need to rely on your ability to look at things for us. Is that something you could do? If so, it might worth building a version of elyra that hardcodes a namespace value corresponding to your environment, just to see if a) this is a viable approach and b) perhaps shed light on the "next" item that requires focus relative to this configuration. Your help will be greatly appreciated. If you're unable to build such a version, we could probably quickly add support to use an environment variable to accomplish the same for now.
We need to further investigate this, a couple of references here:
@reevejd we will get back to you on this once we are able to reproduce the issue and have more info.
@kevin-bates Thanks for the suggestion. I built a version of Elyra locally with my namespace hard-coded and passed into the create_experiment
call like suggested. Unfortunately this didn't change anything; I still get the same error. Re: cluster availability, I can provide access to the cluster for any IBMer who is willing to troubleshoot. I'm also happy to just try whatever other suggestions you have / provide information about the cluster.
@reevejd - Currently building a system to reproduce the issue, if possible could you let me know which version of kubernetes/openshift you are using and any other custom configurations you may have ..e.g. ldap etc.
@akchinSTC Thanks. I'm using Kubernetes 1.17 on IBM Cloud. The kfdef I used is here. I'm using dex with static users for basic auth as described in the kubeflow docs.
@akchinSTC were you able to build a system to reproduce the issue? Please let me know if there's anything I can do to help.
@reevejd - yes, we have a system up atm with the kfdef you supplied and a basic dex config with a dummy user and were able to recreate the issue. Im currently looking at possible workarounds/solutions and will get back to you as soon as I find a solution
@reevejd - I think I have a solution that should get you past the error.
When creating the static users I assume you are following this guide to modify the default dex config yaml in your k8s cluster -https://www.kubeflow.org/docs/started/k8s/kfctl-istio-dex/#add-static-users-for-basic-auth.
After creating a static/dummy user, you'll login to the kubeflow cluster via this screen (via http://kubeflow.cluster:31380...) with the same email and pw as the one(s) created in the previous step.
At this point if the user has never logged into the cluster before, kubeflow should prompt you to create a new namespace for your experiments, it should default to your email shortname, accept it and continue
Using the workaround branch here, rebuild elyra and edit your kfp runtime metadata and assign the same namespace in the previous step to the new user_namespace
box. (in our example it was test
) prior to running the pipeline - https://github.com/akchinSTC/elyra-1/tree/issue-1053
Double check your runtime information and that the api endpoint credentials are the same as the ones used in the first step with the static users/basic auth.
These steps worked for me and hopefully get you past the issue for now. Let me know if you run into any issues. We plan on adding the namespace parameter into the runtime metadata moving forward.
Hi @akchinSTC, thanks for the update. Unfortunately haven't been able to build from your branch to test it out. Running make clean install
gives me a not a valid npm package
error because of an underlying typescript error:
src/CodeSnippetWidget.tsx(445,11): error TS2322: Type '{ children: Element; displayName: string; tooltip: any; actionButtons: IMetadataActionButton[]; onExpand: () => void; onMouseDown: (event: any) => void; }' is not assignable to type 'IntrinsicAttributes & IntrinsicClassAttributes<ExpandableComponent> & Readonly<IExpandableComponentProps> & Readonly<...>'.
running make docker-image
also errors out, but with a different error message:
#9 8.995 ERROR: Cannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
#9 ERROR: executor failed running [/bin/sh -c pip install --upgrade pip jupyterlab notebook]: runc did not terminate sucessfully
------
> [4/9] RUN pip install --upgrade pip jupyterlab notebook:
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: executor failed running [/bin/sh -c pip install --upgrade pip jupyterlab notebook]: runc did not terminate sucessfully
make: *** [docker-image] Error 1
If there's an easy fix for the above errors, great, I can do that. Otherwise if you publish a dev docker image I can check if your fix works for me.
Thanks @akchinSTC! The docker image you provided me does fix the problem so https://github.com/elyra-ai/elyra/pull/1081 will resolve this issue.
I deployed kubeflow from using this kfdef (uses manifests from v1.1-branch)
When trying to run a pipeline on an external, multi-user auth cluster I’m getting a modal pop-up with this error:
The relevant parts of the JupyterLab logs seem to be:
The full trace is in this gist.
Elyra successfully creates the pipeline, and if I then log in to the kubeflow GUI and manually create an experiment from the pipeline Elyra created, it works as expected.
I notice the same error message in a kubeflow/pipelines issue here but it seems to only apply only to applications using the in-cluster endpoint, which isn’t the case for me.