Telecominfraproject / openlan-cgw

BSD 3-Clause "New" or "Revised" License
1 stars 3 forks source link

Difficulties getting CGW to run #85

Open ncalad opened 5 days ago

ncalad commented 5 days ago

We have zookeeper and kafka running but when we try to run the CGW application, the container terminates and the logs show ...

./run_cgw.sh openlan-cgw-img:3d46cc3 ucentral-cgw-container

docker logs 010d7e028b78 [2024-09-23T22:08:40Z INFO ucentral_cgw] Starting CGW application, rev tag: [2024-09-23T22:08:40Z INFO ucentral_cgw] (1048576, 1048576) [2024-09-23T22:08:40Z INFO ucentral_cgw] (1048576, 1048576) %3|1727129320.768|FAIL|rdkafka#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 0ms in state CONNECT) [2024-09-23T22:08:40Z ERROR ucentral_cgw::cgw_remote_discovery] Can't create CGW Remote Discovery client: Redis client create failed (Connection(ConnectionFailed)) [2024-09-23T22:08:40Z ERROR ucentral_cgw::cgw_connection_server] Can't create CGW Connection server: Remote Discovery create failed: RemoteDiscovery("Redis client create failed") thread 'main' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/blocking/shutdown.rs:51:21: Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context. note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Cahb commented 5 days ago

Hey @ncalad, The reason why it fails, is it can't connect to kafka and redis, it's because of the enviroment variables being set to default values; Also some other service-requirements are not met;

Let's do the following: i've prepared a special changes that would eliminate all this issues and would allow to run CGW all-in-one; Basically, it starts all the necessary containers with proper configs and so on. We can view this as a 'default' cgw enviroment, and it could be used as a starting point for anyone who wants to try CGW out-of-the-box;

Here's the branch: https://github.com/Telecominfraproject/openlan-cgw/tree/feat/all_in_one_make

Simply pull the branch and run make / make all should do all the necessary stuff under the hood; In case if you see any difficulties, please also write back, as this is only dev-tested by me, and we need to have someone else to try and use it. If it works, we'll merge it into main directly.

P.S. this change also means you can drop your zookeeper and kafka that you previously created;

Cahb commented 5 days ago

Few more things: prints like

err Invalid symbol 45, offset 0. 

and

%3|1727172666.463|FAIL|rdkafka#producer-1| [thrd:docker-broker-1:9092/bootstrap]: docker-broker-1:9092/bootstrap: Connect to ipv4#172.23
.0.3:9092 failed: Connection refused (after 0ms in state CONNECT)                                                                       
%3|1727172666.464|FAIL|CGW0#consumer-2| [thrd:docker-broker-1:9092/bootstrap]: docker-broker-1:9092/bootstrap: Connect to ipv4#172.23.0.
3:9092 failed: Connection refused (after 0ms in state CONNECT)    

Are safe; We will resolve them in the future; The first one is safe and should be removed completely; The second one indicates that lazy connection connect failed, but it will retry and it should work / make run once again fixes the issue; It fails, because we're running containers (kafka/redis/PGSQL) from docker-compose, and CGW is spawned in as not part of compose file; Hence, we need a proper way to synchronize their startup/ready states; We will address this in the future;

ncalad commented 4 days ago

We were able to get the cgw container to run. Can we browse to it or perform some other test to see if everything is correct?

root@docker-desktop:/# netstat -anp | grep LISTEN tcp 0 0 0.0.0.0:6379 0.0.0.0: LISTEN -
tcp 0 0 0.0.0.0:111 0.0.0.0:
LISTEN -
tcp 0 0 0.0.0.0:5432 0.0.0.0: LISTEN -
tcp 0 0 0.0.0.0:36541 0.0.0.0:
LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0: LISTEN 1/ucentral-cgw
tcp 0 0 0.0.0.0:50051 0.0.0.0:
LISTEN 1/ucentral-cgw
tcp 0 0 0.0.0.0:9092 0.0.0.0: LISTEN -
tcp 0 0 0.0.0.0:9094 0.0.0.0:
LISTEN -
tcp6 0 0 :::111 ::: LISTEN -
tcp6 0 0 :::60981 :::
LISTEN -
unix 2 [ ACC ] STREAM LISTENING 32710 - /run/containerd/s/d75d140b9730bc4fa15482715a5a93195fe591fcacddecbcdf15c0a6e7a138a3 unix 2 [ ACC ] STREAM LISTENING 33546 - /run/containerd/s/15cd467bbd6f6e4e2a254505a853d13738f7a65efef49deaae85e55a9973ce03 unix 2 [ ACC ] STREAM LISTENING 16982 - /var/run/docker.sock unix 2 [ ACC ] STREAM LISTENING 18662 - /run/rpcbind.sock unix 2 [ ACC ] STREAM LISTENING 17542 - /run/grpcfuse.mount.sock unix 2 [ ACC ] STREAM LISTENING 41010 - /run/containerd/s/786ea0a701c2f25b1bbf02b51ee5c4f2251b6be9f2d3501d5bf0fd256f86fab5 unix 2 [ ACC ] STREAM LISTENING 17681 - /run/containerd/containerd.sock.ttrpc unix 2 [ ACC ] STREAM LISTENING 17682 - /run/containerd/containerd.sock unix 2 [ ACC ] STREAM LISTENING 17729 - /var/run/docker/metrics.sock unix 2 [ ACC ] STREAM LISTENING 18143 - /var/run/docker/libnetwork/7fd7ea8ad1ff.sock unix 2 [ ACC ] STREAM LISTENING 43337 - /run/containerd/s/8495dff9ed02e247954d88072e3ea812b3c0943748b4eafeea4ea058a5b2e0ae unix 2 [ ACC ] STREAM LISTENING 17452 - /run/guest-services/wsl2-expose-ports.sock unix 2 [ ACC ] STREAM LISTENING 16915 - /run/guest-services/debug-shell.sock unix 2 [ ACC ] STREAM LISTENING 16921 - /run/guest-services/diagnosticd.sock unix 2 [ ACC ] STREAM LISTENING 16976 - /run/guest-services/docker.proxy.sock unix 2 [ ACC ] STREAM LISTENING 16981 - /run/guest-services/docker-api-proxy-control.sock unix 2 [ ACC ] STREAM LISTENING 16983 - /run/guest-services/docker.sock unix 2 [ ACC ] STREAM LISTENING 16984 - /run/guest-services/lifecycle-server.sock unix 2 [ ACC ] STREAM LISTENING 17544 - /run/guest-services/filesystem-event.sock unix 2 [ ACC ] STREAM LISTENING 17546 - /run/guest-services/filesystem-test.sock root@docker-desktop:/# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 1 18:25 pts/0 00:00:05 ucentral-cgw root 30 0 0 18:26 pts/1 00:00:00 /bin/sh root 36 0 0 18:26 pts/2 00:00:00 bash root 258 36 0 18:32 pts/2 00:00:00 ps -ef

ncalad commented 4 days ago

We noticed a problem with the cert file ...

%3|1727202338.058|FAIL|CGW0#consumer-2| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 59ms in state CONNECT) [2024-09-24T18:25:38Z INFO ucentral_cgw::cgw_db_accessor] Connection to SQL DB has been established! [2024-09-24T18:25:38Z INFO ucentral_cgw::cgw_remote_discovery] Connection to REDIS DB has been established! [2024-09-24T18:25:38Z INFO ucentral_cgw::cgw_remote_server] Starting GRPC server id 0 - listening at 0.0.0.0:50051 [2024-09-24T18:25:38Z ERROR ucentral_cgw::cgw_tls] Failed to open TLS certificate file: /etc/cgw/certs/cas.pem. Error: No such file or directory (os error 2) [2024-09-24T18:25:38Z ERROR ucentral_cgw] Failed to create TLS acceptor. Error: Failed to open TLS certificate file: /etc/cgw/certs/cas.pem. Error: No such file or directory (os error 2) %3|1727202340.073|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 69ms in state CONNECT) %3|1727202340.140|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 66ms in state CONNECT, 1 identical error(s) suppressed) %3|1727202374.111|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 69ms in state CONNECT, 9 identical error(s) suppressed) %3|1727202411.933|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 68ms in state CONNECT, 4 identical error(s) suppressed) %3|1727202451.493|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 64ms in state CONNECT, 4 identical error(s) suppressed) %3|1727202490.124|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 65ms in state CONNECT, 4 identical error(s) suppressed)

Cahb commented 4 days ago

@ncalad could you please post the output you get from running make? Especially where this part starts:

Starting CGW...
CGW LOG LEVEL                     : debug
CGW ID                            : 0
CGW GROUPS CAPACITY/THRESHOLD     : 1000:50
...
ncalad commented 4 days ago

Here you go ... What's Next? View summary of image vulnerabilities and recommendations → docker scout quickview Docker build done Starting CGW... CGW LOG LEVEL : debug CGW ID : 0 CGW GROUPS CAPACITY/THRESHOLD : 1000:50 CGW GROUP INFRAS CAPACITY : 2000 CGW WSS THREAD NUM : 4 CGW WSS IP/PORT : 0.0.0.0:15002 CGW WSS CAS : cas.pem CGW WSS CERT : cert.pem CGW WSS KEY : key.pem CGW GRPC PUBLIC HOST/PORT : openlan_cgw:50051 CGW GRPC LISTENING IP/PORT : 0.0.0.0:50051 CGW KAFKA HOST/PORT : docker-broker-1:9092 CGW KAFKA TOPIC : CnC:CnC_Res CGW DB NAME : cgw CGW DB HOST/PORT : docker-postgresql-1:5432 CGW DB TLS : no CGW REDIS HOST/PORT : docker-redis-1:6379 CGW REDIS TLS : no CGW METRICS PORT : 8080 CGW CERTS PATH : /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server CGW ALLOW CERT MISMATCH : no CGW NB INFRA CERTS PATH : /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server CGW NB INFRA TLS : no CGW UCENTRAL AP DATAMODEL URI : https://raw.githubusercontent.com/Telecominfraproject/wlan-ucentral-schema/main/ucentral.schema.json CGW UCENTRAL SWITCH DATAMODEL URI : https://raw.githubusercontent.com/Telecominfraproject/ols-ucentral-schema/main/ucentral.schema.json 2247ff21a29b44788148d0bc451fb4fd46a7b14d7cf48610ba9b6327ea837da3 docker: Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:15002 -> 0.0.0.0:0: listen tcp 0.0.0.0:15002: bind: address already in use. make: *** [Makefile:66: run] Error 125

Cahb commented 4 days ago

Are there any certificates at the /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server? Can you please post output of

ls /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server

Also, few more things: could you please check if your all_in_one_make branch is up to date? I've updated it few times using forcepush, shouldn't affect anything that much, but still;

Also, the make should also stop the container, not sure why make failed with that last error;

Could you please also post output of the following command please

docker ps
ncalad commented 4 days ago

A few minutes ago, we ran the generate_certs script and that directory now contains ...

ls /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server cas.pem cert.pem gw.crt gw.key key.pem

Cahb commented 4 days ago

@ncalad please also post output of docker ps command, it seems like either some of the containers are not running or they reside in different networks

ncalad commented 4 days ago

docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3081725ba7f8 bitnami/kafka:latest "/opt/bitnami/script…" 4 hours ago Up 58 minutes (healthy) 0.0.0.0:9092->9092/tcp, 0.0.0.0:9094->9094/tcp docker-broker-1 6983b702416c bitnami/redis:latest "/opt/bitnami/script…" 4 hours ago Up 59 minutes 0.0.0.0:6379->6379/tcp docker-redis-1 0416b8d69f41 postgres:latest "docker-entrypoint.s…" 4 hours ago Up 59 minutes 0.0.0.0:5432->5432/tcp docker-postgresql-1 010d7e028b78 openlan-cgw-img:3d46cc3 "ucentral-cgw" 21 hours ago Up 57 minutes ucentral-cgw-container

Cahb commented 4 days ago

Okay, @ncalad Would you also please post output of the following: docker inspect docker-broker-1 -f "{{json .NetworkSettings.Networks }}" docker inspect openlan_cgw -f "{{json .NetworkSettings.Networks }}" docker inspect -f '{{ .Mounts }}' openlan_cgw docker inspect -f "{{ .Config.Env }}" openlan_cgw

ncalad commented 4 days ago

docker inspect docker-broker-1 -f "{{json .NetworkSettings.Networks }}" docker inspect openlan_cgw -f "{{json .NetworkSettings.Networks }}" docker inspect -f '{{ .Mounts }}' openlan_cgw docker inspect -f "{{ .Config.Env }}" openlan_cgw {"docker_cgw_network":{"IPAMConfig":null,"Links":null,"Aliases":["docker-broker-1","broker","3081725ba7f8"],"MacAddress":"02:42:ac:15:00:04","DriverOpts":null,"NetworkID":"fca38b8dbb60c4f4a9044909410689d4e8c0d1120edf1ad5e9876a78d4adeb1b","EndpointID":"50c7d1171fc7375836a3d56633ddafef49926c536e100bc117671454daf3fdef","Gateway":"172.21.0.1","IPAddress":"172.21.0.4","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}} {"docker_cgw_network":{"IPAMConfig":null,"Links":null,"Aliases":null,"MacAddress":"","DriverOpts":null,"NetworkID":"","EndpointID":"","Gateway":"","IPAddress":"","IPPrefixLen":0,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}} [{bind /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server /etc/cgw/certs true rprivate} {bind /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server /etc/cgw/nb_infra/certs true rprivate}] [CGW_REDIS_TLS=no CGW_METRICS_PORT=8080 CGW_ID=0 CGW_WSS_CERT=cert.pem CGW_GRPC_PUBLIC_HOST=openlan_cgw CGW_DB_NAME=cgw CGW_ALLOW_CERT_MISMATCH=no CGW_FEATURE_TOPOMAP_ENABLE CGW_UCENTRAL_SWITCH_DATAMODEL_URI=https://raw.githubusercontent.com/Telecominfraproject/ols-ucentral-schema/main/ucentral.schema.json CGW_KAFKA_PRODUCE_TOPIC=CnC_Res CGW_WSS_IP=0.0.0.0 CGW_GRPC_PUBLIC_PORT=50051 CGW_KAFKA_HOST=docker-broker-1 CGW_KAFKA_CONSUME_TOPIC=CnC CGW_DB_USERNAME=cgw CGW_GROUPS_CAPACITY=1000 CGW_GROUPS_THRESHOLD=50 CGW_GRPC_LISTENING_IP=0.0.0.0 CGW_NB_INFRA_TLS=no CGW_UCENTRAL_AP_DATAMODEL_URI=https://raw.githubusercontent.com/Telecominfraproject/wlan-ucentral-schema/main/ucentral.schema.json CGW_GROUP_INFRAS_CAPACITY=2000 DEFAULT_WSS_THREAD_NUM=4 CGW_WSS_KEY=key.pem CGW_KAFKA_PORT=9092 CGW_DB_HOST=docker-postgresql-1 CGW_REDIS_PORT=6379 CGW_WSS_PORT=15002 CGW_WSS_CAS=cas.pem CGW_GRPC_LISTENING_PORT=50051 CGW_DB_TLS=no CGW_REDIS_HOST=docker-redis-1 CGW_LOG_LEVEL=debug CGW_DB_PORT=5432 CGW_DB_PASSWORD=123 PATH=/usr/local/cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin RUSTUP_HOME=/usr/local/rustup CARGO_HOME=/usr/local/cargo RUST_VERSION=1.77.0 CGW_CONTAINER_BUILD_REV= CGW_CONTAINER_BUILD_BRANCH= CGW_CONTAINER_BUILD_TIME=0]

Cahb commented 4 days ago

@ncalad Docker engine fails to connect CGW container properly to the network; This is the reason for the errors with broker connect and stuff like that;

First of all, could you please launch the following command:

docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs"

Just to make sure at least volume got mounted;

Also please post you docker / compose version:

docker --version
docker compose version

Then try to run make stop and then make again to see if that helps;

ncalad commented 4 days ago

docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs" cas.pem cert.pem gw.crt gw.key key.pem

docker --version Docker version 27.3.1, build ce12230 oguser@oguser-virtual-machine:~/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker compose version Docker Compose version v2.19.0 oguser@oguser-virtual-machine:~/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$

2024-09-24T19:43:47Z ERROR ucentral_cgw::cgw_tls] Failed to open TLS private key file: /etc/cgw/certs/key.pem. Error: Permission denied (os error 13) [2024-09-24T19:43:47Z ERROR ucentral_cgw] Failed to create TLS acceptor. Error: Failed to open TLS private key file: /etc/cgw/certs/key.pem. Error: Permission denied (os error 13) [2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] pre_rebalance callback, assigned partition(s): 0 1 [2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] post_rebalance callback, assigned partition(s): 0 1 oguser@oguser-virtual-machine:~/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs" cas.pem cert.pem gw.crt gw.key key.pem oguser@oguser-virtual-machine:~/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs" cas.pem cert.pem gw.crt gw.key key.pem oguser@oguser-virtual-machine:~/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker exec -it openlan_cgw bash root@dc97c92e43d0:/# cd /etc root@dc97c92e43d0:/etc# cd cgw/ root@dc97c92e43d0:/etc/cgw# cd certs/ root@dc97c92e43d0:/etc/cgw/certs# ls cas.pem cert.pem gw.crt gw.key key.pem root@dc97c92e43d0:/etc/cgw/certs# ls -alt total 28 drwxr-xr-x 4 root root 4096 Sep 24 19:43 .. -rw-rw-r-- 1 root root 3631 Sep 24 19:06 gw.crt -rw------- 1 root root 3272 Sep 24 19:06 gw.key drwxrwxr-x 2 root root 4096 Sep 24 16:03 . -rw------- 1 nobody nogroup 3272 Sep 24 16:03 key.pem -rw-r--r-- 1 nobody nogroup 3631 Sep 24 16:03 cert.pem -rw-r--r-- 1 nobody nogroup 1757 Sep 24 16:03 cas.pem

root can't open open key.pem

Cahb commented 4 days ago

Okay, so first note is that we never tried to use this stuff on VM; It shouldn't make any difference, but still, FYI;

Second thing is i think restart helped? I can tell it connected to broker because to the following prints: [2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] pre_rebalance callback, assigned partition(s): 0 1 [2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] post_rebalance callback, assigned partition(s): 0 1

You can try launching make stop, and changing owner of the files, e.g.

chown root:root /OpenLAN/openlan-cgw/utils/cert_generator/certs/server/key.pem
chown root:root /OpenLAN/openlan-cgw/utils/cert_generator/certs/server/cert.pem
chown root:root /OpenLAN/openlan-cgw/utils/cert_generator/certs/server/cas.pem

NOTE: you have to launch this from host OS, not container

Also, last thing: is this VM you're using - is your HOST OS Windows or Linux? E.g. are you using Virtualbox or whatever on the Windows machine by any chance?

Cahb commented 4 days ago

@ncalad did you have a chance to look into these steps i've posted? Also, just out of curiosity: what company are you working in? I was thinking maybe i can grab your slack ID and we can invite you to the OpenWifi / OpenLan slack channels we have And we can debug this issue a bit faster / in real time chatting + You could ask OpenLan / OpenWiFi questions directly there