Getting caught exception "stoi" when trying to deploy everest-demo to Kubernetes

volkert-fastned commented 4 months ago

Hi! 👋

I've tried deploying the everest-demo container images to a Kubernetes cluster (specifically Amazon EKS), and although mqtt-server and node-red deployed successfully, manager kept crashing with the following error:

[2024-06-05 13:53:18.883199] [0x00007f730994bb48] [info]    Main manager process exits because of caught exception:
stoi

I wrote the following script to convert the Docker Compose file to a Helm chart for deployment to Kubernetes:

#!/bin/sh

# This script requires the following tools, all of which can be installed with Homebrew:
# - wget
# - kompose
# - kubectl
# - helm
# - kubectx

export EVEREST_MANAGER_CPUS='2.0'
export EVEREST_MANAGER_MEMORY='1536mb'
export TAG='0.0.16'

# Change the following value to the proper Kubernetes context alias in your configuration
K8S_CONTEXT_ALIAS=k8s-tooling-cluster

mkdir -p ./tmp
rm -rf ./tmp/*
cd ./tmp || exit
wget https://raw.githubusercontent.com/EVerest/everest-demo/main/docker-compose.iso15118-dc.yml
kompose -f docker-compose.iso15118-dc.yml convert -c
cd ..
helm lint ./tmp/docker-compose.iso15118-dc || exit
kubectx ${K8S_CONTEXT_ALIAS} || exit
helm upgrade everest ./tmp/docker-compose.iso15118-dc --cleanup-on-fail --create-namespace --description "EVerest demo" --dry-run=client --install --namespace everest || exit
helm upgrade everest ./tmp/docker-compose.iso15118-dc --cleanup-on-fail --create-namespace --description "EVerest demo" --dry-run=server --install --namespace everest || exit
echo "Helm dry-run on server successful. To actually deploy to the tooling cluster, run the following command:"
echo ""
echo "kubectx ${K8S_CONTEXT_ALIAS} && helm upgrade everest ./tmp/docker-compose.iso15118-dc --cleanup-on-fail --create-namespace --description \"EVerest demo\" --install --namespace everest"
echo ""

I ran this script, and then I ran the command that the script printed out in the end and the deployment initially appeared to be successful, until I noticed the pod belonging to the deployment manager constantly restarting and failing and eventually going into CrashLoopBackOff.

A kubectl -n everest logs [pod name] yielded the "stoi" error that I mentioned further above.

When I changed spec.template.spec.containers[0].command in the generated manager-deployment.yaml file as below and then ran the same helm upgrade command again, I managed to get the pod to start successfully, so I could log into it and try some troubleshooting:

    spec:
      containers:
        - command: [ "/bin/sh" ]
          args: [ "-c", "while true; do echo hello; sleep 10;done" ]

When I ran the helm upgrade command again to apply this, the manager pod started successfully, and then I could log into it as follows (note that you need to replace the [manager-pod-name] part, because that changes every time the deployment is updated and a new pod is spun up):

kubectl -n everest exec pod/[manager-pod-name] -it -- /bin/sh

In this console, it was easy to recreate the error:

/ext/source/build/run-scripts/run-sil-dc.sh

I noticed that I could run manager with the --help option just fine (I looked in in run-sil-dc.sh to see how I could run it) :

LD_LIBRARY_PATH=/ext/source/build/dist/lib:$LD_LIBRARY_PATH PATH=/ext/source/build/dist/bin:$PATH manager --help

...But whenever I would try any of the configurations in /ext/source/config, I would get that weird stoi (string to integer conversion?) exception, regardless of whether or not I included the --check option:

LD_LIBRARY_PATH=/ext/source/build/dist/lib:$LD_LIBRARY_PATH PATH=/ext/source/build/dist/bin:$PATH manager --check --conf /ext/source/config/config-sil-dc.yaml

LD_LIBRARY_PATH=/ext/source/build/dist/lib:$LD_LIBRARY_PATH PATH=/ext/source/build/dist/bin:$PATH manager --check --conf /ext/source/config/config-example.yaml

LD_LIBRARY_PATH=/ext/source/build/dist/lib:$LD_LIBRARY_PATH PATH=/ext/source/build/dist/bin:$PATH manager --conf /ext/source/config/config-example.yaml

I took at look at the manager.cpp source code, but it wasn't very clear where exactly the exception was being thrown, because no other hints were being given other than the exception message stoi.

It appears to be happening somewhere in the int boot(...) function, before the splash banner is printed with EVLOG_info.

Strangely enough, when I run the same Docker container image locally, I can't reproduce this issue:

docker run --rm -it --platform linux/amd64 --entrypoint sh ghcr.io/everest/everest-demo/manager:0.0.16
LD_LIBRARY_PATH=/ext/source/build/dist/lib:$LD_LIBRARY_PATH PATH=/ext/source/build/dist/bin:$PATH manager --check --conf /ext/source/config/config-sil-dc.yaml

The result in that case, on an Apple Silicon MacBook, running the Docker container in x86 emulation mode:

2024-06-05 14:21:36.996736 [INFO] manager          ::   ________      __                _
2024-06-05 14:21:37.001297 [INFO] manager          ::  |  ____\ \    / /               | |
2024-06-05 14:21:37.001332 [INFO] manager          ::  | |__   \ \  / /__ _ __ ___  ___| |_
2024-06-05 14:21:37.001609 [INFO] manager          ::  |  __|   \ \/ / _ \ '__/ _ \/ __| __|
2024-06-05 14:21:37.001629 [INFO] manager          ::  | |____   \  /  __/ | |  __/\__ \ |_
2024-06-05 14:21:37.001643 [INFO] manager          ::  |______|   \/ \___|_|  \___||___/\__|
2024-06-05 14:21:37.001659 [INFO] manager          ::
2024-06-05 14:21:37.001689 [INFO] manager          :: Using MQTT broker localhost:1883
2024-06-05 14:21:37.016064 [ERRO] manager         int main(int, char**) :: Main manager process exits because of caught exception:
Syscall pipe2() failed (Invalid argument), exiting

It's also an error, but at least a different one.

I'm a bit at a loss now.

Could you maybe help me getting this deployed to Kubernetes? (So far I've only tried AWS EKS, but I guess I could try this in a local minikube cluster or something too. Let me know if that would help.)

I also noticed that the everest-demo/manager container images are not yet multi-platform, but the test cluster in which I tried to deploy it has nodes running on an x86_64 architecture, so that shouldn't be the problem.

Thank you kindly in advance with helping me getting this deployed to our test cluster for evaluation! 🙏

volkert-fastned commented 3 months ago

Update after asking this question on the LF Energy Zulip server:

The cause has been found. Apparently, MQTT_SERVER_PORT wasn't set properly in the manager container. It contained the value 'tcp://10.100.173.215:1883', but it was supposed to be just the port (1883). And since manager expects a numeric value there, a string-to-int conversion (stoi) fails.

It's still unclear how that MQTT_SERVER_PORT environment variable was being set in the manager container though, since neither the Docker Compose yaml file nor the Helm chart that was generated from it contained any references to that environment variable. 🤷🏽‍♂️

@hikinggrass FYI

volkert-fastned commented 3 months ago

Manually adding the following snippet to manager-deployment.yaml under spec.template.spec.containers[0].env in the generated Helm chart makes it work when you install or upgrade the chart with Helm:

            - name: MQTT_SERVER_PORT
              value: "1883"

corneliusclaussen commented 3 months ago

Added a warning for this case:

https://github.com/EVerest/everest-framework/pull/196

shankari commented 3 months ago

@volkert-fastned all the docker-compose files here do set the MQTT_SERVER_ADDRESS=mqtt-server (e.g. https://github.com/EVerest/everest-demo/blob/6ac32289f5ad2769c3fa6da88289a54c7e072a57/docker-compose.ocpp201.yml#L21 or https://github.com/EVerest/everest-demo/blob/6ac32289f5ad2769c3fa6da88289a54c7e072a57/docker-compose.yml#L19)

The server runs at the default port, so we did not have to change the port. Do you recall why you set the MQTT_SERVER_PORT instead of the address?

EVerest / everest-demo

Getting caught exception "stoi" when trying to deploy everest-demo to Kubernetes #54