DSD-DBS / capella-collab-manager

A web app for collaboration on Capella (MBSE) projects
https://dsd-dbs.github.io/capella-collab-manager/
Apache License 2.0
22 stars 5 forks source link

Getting started failes on a clean ubuntu server due to missing stuff #745

Closed vik378 closed 1 year ago

vik378 commented 1 year ago

Looks like our "getting started" part of the readme is a bit incomplete - we miss a few dependencies:

it also could be an improvement in onboarding experience if we point a new starter to the published capella-dockerimages instead of build-from-scratch (saving build time).

another question - do we still need EASE image?

next observation - after successfull launch the thing serves on 0.0.0.0:8080, however requesting that from anything that is not localhost results in 404 (design or intention?)

running the thing on a headless server leads to the below error (there is no browser to open):

/bin/bash: line 7: xdg-open: command not found
make: *** [Makefile:93: open] Error 127

Out of the box, session start fails with "an error occured, please try again" - but trying again leads to the same result as nothing changed ;-)

The backend error log is full of exciting messages:

running make provision-guacamole gets it a little further - a session card appears with "failed to create session", the new error is "Could not parse log"

Traceback (most recent call last):
  File "/tmp/backend/capellacollab/sessions/sessions.py", line 76, in _determine_session_state
    logs = get_operator().get_session_logs(session.id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/backend/capellacollab/sessions/operators/k8s.py", line 413, in get_session_logs
    return self.v1_core.read_namespaced_pod_log(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
    return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 23866, in read_namespaced_pod_log_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 241, in GET
    return self.request("GET", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '80086e04-2995-4176-9de1-f8c765ea40fe', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 12 Jun 2023 06:50:48 GMT', 'Content-Length': '261'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"litkahioewsmzojuszisqxvuo\" in pod \"litkahioewsmzojuszisqxvuo-8d5bc7c5b-ttqfp\" is waiting to start: trying and failing to pull image","reason":"BadRequest","code":400}

I didnt get further so far.

make[2]: Entering directory '/home/viktor/capella-collab-manager/capella-dockerimages'
/bin/sh: 1: [: 1: unexpected operator
make[2]: Leaving directory '/home/viktor/capella-collab-manager/capella-dockerimages'
Successfully ran target 'capella/remote' for Capella version 6.0.0.
docker build --platform linux/amd64 -t capella/ease:$DOCKER_TAG --build-arg BASE_IMAGE=capella/base:$DOCKER_TAG --build-arg BUILD_TYPE=online ease
make PUSH_IMAGES=1 IMAGENAME=capella/ease .push

prometheus log:

~/capella-collab-manager$ kubectl logs dev-prometheus-nginx-548c6f89df-gwcr9
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/06/12 07:00:47 [error] 20#20: *1 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.42.0.54, server: _, request: "GET /prometheus/graph HTTP/1.1", upstream: "http://10.43.236.202:9118/prometheus/graph", host: "localhost:8080", referrer: "http://localhost:8080/"
10.42.0.54 - - [12/Jun/2023:07:00:47 +0000] "GET /prometheus/graph HTTP/1.1" 504 569 "http://localhost:8080/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
2023/06/12 07:18:53 [error] 20#20: *4 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.42.0.54, server: _, request: "GET /prometheus/graph HTTP/1.1", upstream: "http://10.43.236.202:9118/prometheus/graph", host: "localhost:8080", referrer: "http://localhost:8080/"
10.42.0.54 - - [12/Jun/2023:07:18:53 +0000] "GET /prometheus/graph HTTP/1.1" 504 569 "http://localhost:8080/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"

running capella 6 image from github artifacts leads to this log:

Defaulted container "vlgwsiomraniasowlytkicjja" out of: vlgwsiomraniasowlytkicjja, promtail
Current password: New password: Retype new password: passwd: password updated successfully
Changing password for techuser.
Executing Python script '/opt/setup/setup_welcome_screen.py'...
INFO:Capella preparation:Set existing config core_defaultRepositoryDir to /workspace/git
Executing shell script '/opt/setup/replace_env_variables.sh'...
2023-06-12 07:34:51,703 INFO supervisord started with pid 1
2023-06-12 07:34:52,707 INFO spawned: 'idletime' with pid 15
2023-06-12 07:34:52,709 INFO spawned: 'xrdp' with pid 16
2023-06-12 07:34:52,711 INFO spawned: 'xrdp-sesman' with pid 17
2023-06-12 07:34:53,713 INFO success: idletime entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-06-12 07:34:53,713 INFO success: xrdp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-06-12 07:34:53,713 INFO success: xrdp-sesman entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-06-12 07:35:02,527 INFO reaped unknown pid 125 (exit status 1)
2023-06-12 07:35:02,527 INFO reaped unknown pid 127 (exit status 1)
2023-06-12 07:35:03,529 INFO reaped unknown pid 122 (terminated by SIGABRT (core dumped))
MoritzWeber0 commented 1 year ago

Looks like our "getting started" part of the readme is a bit incomplete - we miss a few dependencies:

  • python 3(>=.11?)
  • node and npm (>=16?)

The Python dependency has been added to the README and the Node.js / npm dependency is not required anymore in https://github.com/DSD-DBS/capella-collab-manager/pull/763/commits/3d898ea9aa85f0c2b64fd48d745031fa44044421

it also could be an improvement in onboarding experience if we point a new starter to the published capella-dockerimages instead of build-from-scratch (saving build time).

Implemented in https://github.com/DSD-DBS/capella-collab-manager/pull/763/commits/7624150737a0f6781b615f139e574af3ce5c8574. When the development mode is disabled, the Capella tool is initialized with the public Github images.

another question - do we still need EASE image?

Yes, they are still needed for the read-only images. A dependency graph is available here: https://dsd-dbs.github.io/capella-dockerimages/#build-images-manually-with-docker

next observation - after successfull launch the thing serves on 0.0.0.0:8080, however requesting that from anything that is not localhost results in 404 (design or intention?)

To listen on a different host, please change the .general.host value to the preferred value, e.g., 127.0.0.1. Note that the OAuth mock still redirects to localhost:8080, the value .backend.authentication.oauth.redirectURI has to be changed in the values.yaml too.

In theory, you can also listen on all hosts by removing the host key in line 20 of helm/templates/routing/nginx.ingress.yaml. However, we don't support it from the application side, so the Guacamole and Jupyter redirects will still use the host defined in the values.yaml. If this is a wanted feature, we can also add support for wildcard hosts.

running the thing on a headless server leads to the below error (there is no browser to open):

/bin/bash: line 7: xdg-open: command not found
make: *** [Makefile:93: open] Error 127

Fixed in https://github.com/DSD-DBS/capella-collab-manager/pull/763/commits/19222241c697f539cae0b608c2626f251daafbdc

Out of the box, session start fails with "an error occured, please try again" - but trying again leads to the same result as nothing changed ;-)

The backend error log is full of exciting messages:

  • Could not create an admin token. Please make sure that your Guacamole database is initialized properly. (File "/tmp/backend/capellacollab/sessions/guacamole.py", line 44, in get_admin_token)

The first error message is because of the not initialised database of Guacamole. Most likely, the first run of your make deploy ran into a timeout and never reached the Guacamole provision part.

  • Exception during fetching of last seen. (requests.exceptions.ReadTimeout: HTTPConnectionPool(host='dev-prometheus-server', port=9118): Read timed out. (read timeout=2), File "/tmp/backend/capellacollab/sessions/sessions.py", line 41, in get_last_seen)
  • Could not handle idle sessions ( File "/tmp/backend/capellacollab/sessions/idletimeout.py", line 50, in loop)

My guess is that Prrometheus is not running properly. Can you please share the output of the following command?

kubectl -n collab-manager logs deployment/dev-prometheus-server

running make provision-guacamole gets it a little further - a session card appears with "failed to create session", the new error is "Could not parse log"

Traceback (most recent call last):
  File "/tmp/backend/capellacollab/sessions/sessions.py", line 76, in _determine_session_state
    logs = get_operator().get_session_logs(session.id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/backend/capellacollab/sessions/operators/k8s.py", line 413, in get_session_logs
    return self.v1_core.read_namespaced_pod_log(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log
    return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 23866, in read_namespaced_pod_log_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 241, in GET
    return self.request("GET", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '80086e04-2995-4176-9de1-f8c765ea40fe', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 12 Jun 2023 06:50:48 GMT', 'Content-Length': '261'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"litkahioewsmzojuszisqxvuo\" in pod \"litkahioewsmzojuszisqxvuo-8d5bc7c5b-ttqfp\" is waiting to start: trying and failing to pull image","reason":"BadRequest","code":400}

I didnt get further so far.

make[2]: Entering directory '/home/viktor/capella-collab-manager/capella-dockerimages' /bin/sh: 1: [: 1: unexpected operator make[2]: Leaving directory '/home/viktor/capella-collab-manager/capella-dockerimages' Successfully ran target 'capella/remote' for Capella version 6.0.0. docker build --platform linux/amd64 -t capella/ease:$DOCKER_TAG --build-arg BASE_IMAGE=capella/base:$DOCKER_TAG --build-arg BUILD_TYPE=online ease make PUSH_IMAGES=1 IMAGENAME=capella/ease .push

This has been fixed in https://github.com/DSD-DBS/capella-dockerimages/pull/155

running capella 6 image from github artifacts leads to this log:

Defaulted container "vlgwsiomraniasowlytkicjja" out of: vlgwsiomraniasowlytkicjja, promtail
Current password: New password: Retype new password: passwd: password updated successfully
Changing password for techuser.
Executing Python script '/opt/setup/setup_welcome_screen.py'...
INFO:Capella preparation:Set existing config core_defaultRepositoryDir to /workspace/git
Executing shell script '/opt/setup/replace_env_variables.sh'...
2023-06-12 07:34:51,703 INFO supervisord started with pid 1
2023-06-12 07:34:52,707 INFO spawned: 'idletime' with pid 15
2023-06-12 07:34:52,709 INFO spawned: 'xrdp' with pid 16
2023-06-12 07:34:52,711 INFO spawned: 'xrdp-sesman' with pid 17
2023-06-12 07:34:53,713 INFO success: idletime entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-06-12 07:34:53,713 INFO success: xrdp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-06-12 07:34:53,713 INFO success: xrdp-sesman entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-06-12 07:35:02,527 INFO reaped unknown pid 125 (exit status 1)
2023-06-12 07:35:02,527 INFO reaped unknown pid 127 (exit status 1)
2023-06-12 07:35:03,529 INFO reaped unknown pid 122 (terminated by SIGABRT (core dumped))

This is how the log should look like :)