InterDigitalInc / AdvantEDGE

AdvantEDGE, Mobile Edge Emulation Platform
Apache License 2.0
51 stars 25 forks source link

Having issues bringing up the platform #347

Closed anokun7 closed 2 years ago

anokun7 commented 2 years ago

I am deploying this on k3s, it's all still kubernetes still. I have struggled with this for the last 2 weeks and hence reaching out here.

The first issue I faced is that the ingress controllers do not come up in hostnetwork mode. The pods kept on crashing due to the error Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused. As such I removed the hostnetwork and altered that to use NodePort and that seemed to help. The pods come up now.

However, one other pod for meep-platform-ctrl does not come up and keeps Crashlooping. Following is the error. It seems it has to do with authorizations to connect to couchdb.

time="2022-02-27T17:47:41.256Z" level=info msg="[/meep-platform-ctrl]" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.256Z" level=info msg="Starting MEEP Platform Controller" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.256Z" level=debug msg=Init meep.component=meep-platform-ctrl meep.from="platform-ctrl.go:88"
time="2022-02-27T17:47:41.256Z" level=info msg="Creating new MsgQueue" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.256Z" level=debug msg="Redis Connector connecting to meep-redis-master:6379" meep.component=meep-platform-ctrl meep.from="db.go:70"
time="2022-02-27T17:47:41.258Z" level=info msg="Redis Connector connected to meep-redis-master:6379" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.258Z" level=info msg="Successfully connected to DB" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.258Z" level=info msg="Connected to Message Queue Redis DB" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.258Z" level=info msg="Message Queue created" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.258Z" level=debug msg="Redis Connector connecting to meep-redis-master:6379" meep.component=meep-platform-ctrl meep.from="db.go:70"
time="2022-02-27T17:47:41.26Z" level=info msg="Redis Connector connected to meep-redis-master:6379" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.26Z" level=info msg="Successfully connected to DB" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.26Z" level=info msg="Connected to Redis DB" meep.component=meep-platform-ctrl
time="2022-02-27T17:47:41.26Z" level=debug msg="Establish new couchDB client connection" meep.component=meep-platform-ctrl meep.from="db.go:50"
time="2022-02-27T17:47:41.27Z" level=debug msg="Create DB: scenarios" meep.component=meep-platform-ctrl meep.from="db.go:71"
time="2022-02-27T17:47:41.271Z" level=error msg="Failed connection to Scenario Store. Error: Unauthorized: You are not a server admin." meep.component=meep-platform-ctrl meep.from="platform-ctrl.go:115"
time="2022-02-27T17:47:41.271Z" level=error msg="Failed to initialize Platform Controller" meep.component=meep-platform-ctrl meep.from="main.go:54"
time="2022-02-27T17:47:42.257Z" level=info msg="Ran for 1 seconds" meep.component=meep-platform-ctrl
roymx commented 2 years ago

Hello @anokun7 ,

It's cool that you try to run the platform on K3s... been wanting to try it for some time now but can't get to it.

What version are you running? The line number does not seem to match our latest code release.

Based on the provided logs, the Platform controller fails to initialize because it cannot connect to Couch database. image

Could be that Couch pod is missing Could be that Couch credentials were changed See go-packages/meep-couch/db.go image

anokun7 commented 2 years ago

Thank you for the response @roymx

I'm using a slightly older version of k3s: k3s version v1.19.16+k3s1 (da168695). The latest version simply did not work, too many deprecations and changes. The ingress controller still does not work unless you make some modifications...

So regarding the error connecting to couchdb, I did some digging and it seems the password is fine. However something related to scenarios is not coming up appropriately when queried in couchdb. I'm not sure if scenarios refers to a database or a document or something else. In the couchdb logs, I see these repeated several times:

meep-couchdb-svc-couchdb:5984 10.42.0.26 undefined HEAD /scenarios 404 ok 0
meep-couchdb-svc-couchdb:5984 10.42.0.26 undefined PUT /scenarios 401 ok 0

Update:

I figured scenarios referred to a database and I went ahead and created a database in couchdb called that. Now the error message in the meep-platform-ctrl pod has changed to the one below:

time="2022-03-12T08:22:55.895Z" level=info msg="Successfully connected to DB" meep.component=meep-platform-ctrl
time="2022-03-12T08:22:55.895Z" level=info msg="Connected to Redis DB" meep.component=meep-platform-ctrl
time="2022-03-12T08:22:55.895Z" level=debug msg="Establish new couchDB client connection" meep.component=meep-platform-ctrl meep.from="db.go:50"
time="2022-03-12T08:22:55.899Z" level=error msg="Failed connection to Scenario Store. Error: Unauthorized" meep.component=meep-platform-ctrl meep.from="platform-ctrl.go:115"
time="2022-03-12T08:22:55.899Z" level=error msg="Failed to initialize Platform Controller" meep.component=meep-platform-ctrl meep.from="main.go:54"
time="2022-03-12T08:22:56.893Z" level=info msg="Ran for 1 seconds" meep.component=meep-platform-ctrl
anokun7 commented 2 years ago

Update with workaround:

I really do not know what I am doing, but doing the following in the couchdb seems to help and now the meep-platform-ctrl pod is up and running, no more errors:

root@meep-couchdb-couchdb-0:/opt/couchdb/etc# curl $HOST/scenarios/_security
{"members":{"roles":["_admin"]},"admins":{"roles":["_admin"]}}
root@meep-couchdb-couchdb-0:/opt/couchdb/etc# curl $HOST/scenarios/_security -X PUT -d '{"members":{"roles":[]},"admins":{"roles":["_admin"]}}'
{"ok":true}
root@meep-couchdb-couchdb-0:/opt/couchdb/etc#
anokun7 commented 2 years ago

So, all my pods are running now. How do I access the application? No matter what url I hit, I am getting Unauthorized.

I think it is related to this error in the meep-platform-ctrl logs:

level=error msg="key: data:global:permissions:default: ERR unknown command `JSON.SET`, with args beginning with: `data:global:permissions:default`, `.`, `{\"Mode\":\"allow\",\"RolePermissions\":{}}`, " meep.component=meep-platform-ctrl meep.from="db.go:265"
roymx commented 2 years ago

Hello @anokun7 -

Can you provide the output of the following command? meepctl version all

Thanks, Mike

anokun7 commented 2 years ago

Sure @roymx . Please see below:

Using repo config file: /home/ubuntu/AdvantEDGE/.meepctl-repocfg.yaml
Using meepctl config file: /home/ubuntu/.meepctl.yaml
{
  "name": "meepctl",
  "version": "1.8.1"
}
{
  "name": ".meepctl-repocfg.yaml",
  "version": "1.8.1"
}
{
  "name": "meep-cert-manager",
  "version": "NA"
}
{
  "name": "couchdb",
  "version": "docker.io/library/couchdb:3.1.0",
  "id": "b604d056d8024f10346eab768de7aea06bc0a1b7c55d6087e1b1cd4328c8061c",
  "build": "\nError: No such image: docker.io/library/couchdb:3.1.0"
}
{
  "name": "docker-registry",
  "version": "docker.io/library/registry:2.7.1",
  "id": "169211e20e2f2d5d115674681eb79d21a217b296b43374b8e39f97fcf866b375",
  "build": "\nError: No such image: docker.io/library/registry:2.7.1"
}
{
  "name": "grafana",
  "version": "docker.io/grafana/grafana:7.3.5",
  "id": "511bc20bfcd1b79f3947bb1c33d152f7484e7a91418883fb4dddf71274227321",
  "build": "\nError: No such image: docker.io/grafana/grafana:7.3.5"
}
{
  "name": "meep-influxdb",
  "version": "docker.io/library/influxdb:1.8.0-alpine",
  "id": "5eca9dfe9930a3325323cef801827eb1b0940070465f8f215447b8e732c72b34",
  "build": "\nError: No such image: docker.io/library/influxdb:1.8.0-alpine"
}
{
  "name": "meep-ingress",
  "version": "NA"
}
{
  "name": "kube-state-metrics",
  "version": "quay.io/coreos/kube-state-metrics:v1.9.7",
  "id": "2f82f0da199c60a7699c43c63a295c44e673242de0b7ee1b17c2d5a23bec34cb",
  "build": "\nError: No such image: quay.io/coreos/kube-state-metrics:v1.9.7"
}
{
  "name": "meep-minio",
  "version": "NA"
}
{
  "name": "meep-open-map-tiles",
  "version": "NA"
}
{
  "name": "meep-postgis",
  "version": "docker.io/postgis/postgis:12-3.0",
  "id": "71acda16357f2973034483a4a8363cc9499061120b592bcc3b7f2fbed82da621",
  "build": "\nError: No such image: docker.io/postgis/postgis:12-3.0"
}
{
  "name": "meep-prometheus",
  "version": "NA"
}
{
  "name": "meep-redis",
  "version": "NA"
}
{
  "name": "meep-thanos",
  "version": "NA"
}
{
  "name": "meep-thanos-archive",
  "version": "NA"
}
{
  "name": "helm",
  "version": "v3.5.0",
  "id": "32c22239423b3b4ba6706d450bd044baffdcf9e6"
}
{
  "name": "docker client",
  "version": "20.10.2",
  "id": "2291f61"
}
{
  "name": "docker server",
  "id": "go1.13.15"
}
{
  "name": "k8s client",
  "version": "v1.20.2",
  "id": "faecb196815e248d3ecfb03c680a4507229c2a56"
}
{
  "name": "k8s server",
  "version": "v1.19.16+k3s1",
  "id": "da16869555775cf17d4d97ffaf8a13b70bc738c2"
}
{
  "name": "weave",
  "version": "NA"
}
{
  "name": "meep-auth-svc",
  "version": "docker.io/anoop/meep-auth-svc:latest",
  "id": "9127a75561a67193205d19ff860ef8ec3b6b0bf492f1be78cbc197a2c6a4a0bb",
  "build": "\nError: No such image: docker.io/anoop/meep-auth-svc:latest"
}
{
  "name": "meep-ingress-certs",
  "version": "NA"
}
{
  "name": "meep-mon-engine",
  "version": "docker.io/anoop/meep-mon-engine:latest",
  "id": "ba1fb458247a218daa5d32322c4720259e46e90b3954e02676cd51186f9c5fbb",
  "build": "\nError: No such image: docker.io/anoop/meep-mon-engine:latest"
}
{
  "name": "meep-platform-ctrl",
  "version": "docker.io/anoop/meep-platform-ctrl:latest",
  "id": "599552732dd85dcb898337a52d066c54ce9a1aec89e3e07604a7a1e3b5131078",
  "build": "\nError: No such image: docker.io/anoop/meep-platform-ctrl:latest"
}
{
  "name": "meep-virt-engine",
  "version": "docker.io/anoop/meep-virt-engine:latest",
  "id": "b95bf058a5bf076e268f4c5305eff01f9f0ceab6866a31e0cb05dc21636b2fbd",
  "build": "\nError: No such image: docker.io/anoop/meep-virt-engine:latest"
}
{
  "name": "meep-webhook",
  "version": "docker.io/anoop/meep-webhook:latest",
  "id": "57304e51bb100a2ffd4cb7b26b4c6af4b81044ee5ec989bc72971de59f055791",
  "build": "\nError: No such image: docker.io/anoop/meep-webhook:latest"
}

I see there are quite a few errors, but all my pods seem to be running fine:

ubuntu@ip-172-31-49-219:~$ k get pod
NAME                                                READY   STATUS    RESTARTS   AGE
meep-ingress-defaultbackend-5c57d5cd58-fxs7j        1/1     Running   0          14d
meep-prometheus-operator-c8b8896d7-42mvt            1/1     Running   0          14d
meep-prometheus-node-exporter-d8wk4                 1/1     Running   0          14d
meep-ingress-controller-77qrl                       1/1     Running   0          14d
meep-webhook-5d88f4bf85-t2jj4                       1/1     Running   0          14d
meep-kube-state-metrics-868576f6d4-fhrnh            1/1     Running   1          14d
meep-prometheus-node-exporter-ccfwb                 1/1     Running   1          14d
prometheus-meep-prometheus-server-0                 2/2     Running   3          14d
meep-postgis-0                                      2/2     Running   2          14d
meep-docker-registry-65b77797cb-ghjlf               1/1     Running   0          33h
meep-redis-slave-0                                  2/2     Running   0          33h
meep-influxdb-0                                     1/1     Running   0          33h
meep-redis-master-0                                 2/2     Running   0          33h
meep-auth-svc-5d988d5d68-dpqz5                      1/1     Running   0          33h
meep-virt-engine-c5f6d8845-rdzw8                    1/1     Running   0          33h
meep-mon-engine-846c4dcdb7-ntbcj                    1/1     Running   1          33h
meep-couchdb-couchdb-0                              1/1     Running   0          33h
meep-ingress-controller-r6rc4                       1/1     Running   0          33h
meep-grafana-69c5bbf6c9-pkds2                       1/1     Running   0          33h
meep-open-map-tiles-7d99b886f-jr49t                 1/1     Running   0          33h
meep-platform-ctrl-c54f7849f-p6822                  1/1     Running   2          33h
meep-prometheus-couchdb-exporter-795d6b6dc5-rktfp   1/1     Running   4          33h
ubuntu@ip-172-31-49-219:~$
roymx commented 2 years ago

Hello again... here are some observations from my side.

1- still not sure what version you are running - there seems to be a mismatch between what you run and version 1.8.1 (see observations below) ... or is it a custom version? 2- for the platform access, please provide logs of meep-auth-svc and meep-ingress-controller when you try to access the platform with your browser

Regarding platform access, when we deploy on a self-hosted K8s, the platform is accessible through the platform-ip-address (e.g. on standard ports 80 and 443)


First observation

Your system fetches images from docker.io instead of using the internal meep-docker-registry image This should not be a problem, but please make sure that the images in docker.io are the ones that you are expecting. Are you sure that meepctl CLI tool really stores them there? And if you use a script to upload them (e.g. after meepctl build all && meepctl dockerize all), are you sure that it copies them as expected?

Second observation

Some of the logs lead us to believe that the version of AdvantEDGE being run is not 1.8.1

Namely, this error which appears at line 115 in your running version time="2022-02-27T17:47:41.271Z" level=error msg="Failed connection to Scenario Store. Error: Unauthorized: You are not a server admin." meep.component=meep-platform-ctrl meep.from="platform-ctrl.go:115" But should appear at line 196 according to version 1.8.1 image

Third observation

One of the reported logs contains a reference to JSON.SET when accessing Redis, we don't use this anymore; so supporting 2nd observation, it seems that the version executed is older.

Other couchdb observations

Some comments on what you did to try get things running

anokun7 commented 2 years ago

Very helpful @roymx . Let me check on the things you pointed out and revert back if it still does not work. Thank you so much!

anokun7 commented 2 years ago

@roymx That worked. I was able to rebuild everything and dockerize again. On deploying. I was able to access the app. The ingress controllers still wouldnt come up for some reason, but I was able to expose the service for meep-platform-ctrl as a nodeport and that worked. Thank you very much for the assist.