Explore running the UATs behind proxy

NohaIhab commented 3 months ago

Context

This task to explore and estimate the effort to be able to run the UATs behind the proxy. We need to look at issues #27 and #54 for initial pointers.

What needs to get done

Explore the effort to run the UATs behind proxy
Document the exploration and the tasks we need to create

Definition of Done

the effort for running UATs behind proxy is well understood and broken down

syncronize-issues-to-jira[bot] commented 3 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5946.

This message was autogenerated

NohaIhab commented 2 months ago

In #78 I did the exploration for Pipelines, it is the odd one out in our UATs because we cannot modify the container specs of pipelines runners directly, it has to be done through the kfp sdk API. This is not the case for the rest of the UATs, where we can edit the container spec. Therefore my porposal for the Pipelines UATs is different from the rest.

UATs behind proxy (excluding Pipelines)

I checked issue #27 and did some ad-hoc test runs in a proxy environment What we need to achieve is to make all the workloads' containers have the following proxy environment variables set:

HTTP_PROXY
HTTPS_PROXY
NO_PROXY
http_proxy
https_proxy
no_proxy

Here are my proposals:

Proposal 1: Apply a PodDefault

To run the UATs behind proxy, we need to set the proxy environment variables listed above in 2 places:

The notebook/job from which the UATs are being run
The test workloads that are being deployed by each component test

This can be achieved by:

Creating a PodDefault with the following contents:

apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
name: add-proxy-envs
namespace: <test_namespace>
spec:
desc: Add proxy  #settings
env:
- name: HTTP_PROXY
value: x.x.x.x:port
- name: http_proxy
value: x.x.x.x:port
- name: HTTPS_PROXY
value: x.x.x.x:port
- name: https_proxy
value: x.x.x.x:port
- name: NO_PROXY
value: <no_proxy values>
- name: no_proxy
value: <no_proxy values>
selector:
matchLabels:
  add-proxy-envs: "true"

where test_namespace is the same namespace in which the tests create the notebook/job, as well as the workloads

Apply the PodDefault to the notebook/job
- To apply the PodDefault to the notebook, it can be chosen from the Configurations section in the Notebook creation page.
- To apply the PodDefault to the job, we can add to the labels of the job template here the following
```
add-proxy-envs: "true"
```
Add the label to apply the PodDefault to the workload containers of the test examples. In the notebooks of the UATs (except Pipelines) we are defining the pod spec, the pod spec can then be modified to have the label which matches the PodDefault. For example in Katib UATs, the trial spec template can be modified like this:
```
@@ -160,6 +160,9 @@
 "            \"metadata\": {\n",
 "                \"annotations\": {\n",
 "                    \"sidecar.istio.io/inject\": \"false\"\n",
+    "                },\n",
+    "                \"labels\": {\n",
+    "                    \"add-proxy-envs\": \"true\"\n",
 "                }\n",
 "            },\n",
 "            \"spec\": {\n",
```
it is similar to the way we are disabling istio sidecar injection in the workloads run by the UATs.

Proposal 2: Apply a ConfigMap

It is what Phoevos proposed in #27 and there's a POC implementation in this branch for only passing the proxy envs to the driver job, but not to the workloads. The ConfigMap is to be created from an .env file that is passed to the tests with tox:

tox -e uats -- --env ./params.env

The missing part is how to consume the ConfigMap from the workloads. This can be done in a similar way to Proposal 1, by editing the container spec to use the ConfigMap:

envFrom:
- configMapRef:
    name: use-proxy
    optional: true

Limitation

The drawback of this proposal is that it cannot be used for both driver and notebooks method of running the UATs. I prefer the PodDefaults because when running from notebook, we can easily select it from Configurations dropdown menu. In this case, we can create a PodDefault that consumes the ConfigMap and apply it to Notebook.

Pipelines UATs behind proxy

Before running the test

Apply the PodDefault in proposal 1 to the user namespace and have the Notebook/Job use it.

Use the kfp sdk to inject the proxy environment variables into the pipelines runner pods. Here is how we can do this by modifying the existing notebook:

add a cell at the beginning to get the proxy and no proxy values from the Notebook's environment variables (these envs will be in the notebook pod's env because we apply the poddefault before creating the notebook)
```
PROXY_URL=os.environ['HTTP_PROXY']
NO_PROXY_URLS=os.environ['NO_PROXY']
```

add a cell with a helper function that adds the proxy env variables to the PipelineTask. For more details see the kfp sdk docs. The helper function is as follows:

def add_proxy(obj, proxy=PROXY_URL, no_proxy=NO_PROXY_URLS):
"""Adds the proxy env vars to the PipelineTask object."""
return (
    obj.set_env_variable(name='http_proxy', value=proxy)
    .set_env_variable(name='https_proxy', value=proxy)
    .set_env_variable(name='HTTP_PROXY', value=proxy)
    .set_env_variable(name='HTTPS_PROXY', value=proxy)
    .set_env_variable(name='no_proxy', value=no_proxy)
    .set_env_variable(name='NO_PROXY', value=no_proxy)
)

Modify the cell where the pipeline is defined to use the add_proxy helper so that it adds the proxy env vars to all components:

@dsl.pipeline(name='condition-v2')
def condition_pipeline(text: str = 'condition test', force_flip_result: str = ''):
flip1 = add_proxy(flip_coin(force_flip_result=force_flip_result))
add_proxy(print_msg(msg=flip1.output))

with dsl.Condition(flip1.output == 'heads'):
    flip2 = add_proxy(flip_coin())
    add_proxy(print_msg(msg=flip2.output))
    add_proxy(print_msg(msg=text))

NohaIhab commented 2 months ago

Adding proxy option to tox

As mentioned in #54, we need to have an option to run the proxy tests with tox. This can be tricky because based on my above proposals, the notebooks will need to be slightly modified to be able to run behind proxy.

We can have a different set of Notebooks with a _proxy suffix for every UAT.
- Limitation: It is adding to our maintenance work, double the amount of notebooks every time we change in those tests.
We can have .patch files containing the diff needed to be done to the notebooks and if the proxy option is chosen with tox then the .patch files are used to edit the notebooks before they are run.
- This is still some maintenance work because the patch file will rely on the lines of code, so we still need to update them if the lines changed, but it is less maintenance than another set of notebooks.
??

orfeas-k commented 2 months ago

Really great work @NohaIhab! I have some questions, although they 're not a blocker, probably more things we 'll need to consider during implementation.

Adding proxy option to tox

Another option would be to always keep the label there and apply the podDefault only when tox has the proper configuration. But for pipelines, we 'd still need to edit the notebook with the proxy code

Running from UI

Will the user have to apply manually the PodDefault?
If we follow the .patch approach (which I prefer), will the user need to apply manually those changes? On that note, also for pipelines, will they need to uncomment the code this configuration will add?

If yes, we 'll need instructions for those.

NohaIhab commented 2 months ago

thanks @orfeas-k for the review

Adding proxy option to tox

Your suggestion is a great idea actually, we can always have the label, and conditionally create the PodDefault if we are running the tests behind proxy. If the PodDefault is not created, the label will still be there but no proxy envs will be added to the workloads.

Running from the UI

Yes, I think to run from the UI you have to apply this before creating a notebook, so it can be done manually with instructions in the README
In case we apply the label anyway, the .patch will be needed for pipelines only, the .patch removes the need to have the commented code. In that case, the user will apply the .patch and then the notebook is ready to run. -> instructions for running from the UI will be added for that

NohaIhab commented 2 months ago

closing the issue as now we have enough information to start the implementation thanks @orfeas-k for the review

NohaIhab commented 2 months ago

Afterthoughts: After discussing with the team during implementation, one suggestion was to have in the Notebooks an if condition to check whether the proxy envs are set. and based on that use the PodDefault for the workload or not. This can then eliminate the need for .patch files and make it easier to automate the tests using the driver.

canonical / charmed-kubeflow-uats