criblio / appscope

Gain observability into any Linux command or application with no code modification
https://appscope.dev
Apache License 2.0
268 stars 33 forks source link

[Bug]: k8s-webhook-cert-manager fails to create self-signed certificate in k8s v1.2.6 #1365

Closed ricksalsa closed 1 year ago

ricksalsa commented 1 year ago

Steps To Reproduce

Trying to deploy AppScope into a Kubernetes environment using scope k8s will fail as server is unable to get a signed certificate from the K8s CA.

  1. Try to deploy scope into Kubernetes with the command: scope k8s --cribldest <replace> | kubectl apply -f -
  2. Install completes:
    mutatingwebhookconfiguration.admissionregistration.k8s.io/scope.scope-rc.appscope.io created
    job.batch/webhook-cert-setup created
    clusterrole.rbac.authorization.k8s.io/webhook-cert-cluster-role unchanged
    clusterrolebinding.rbac.authorization.k8s.io/webhook-cert-cluster-role-binding configured
    serviceaccount/webhook-cert-sa created
    clusterrole.rbac.authorization.k8s.io/scope-cluster-role unchanged
    clusterrolebinding.rbac.authorization.k8s.io/scope-cluster-role-binding configured
    serviceaccount/scope-cert-sa created
    deployment.apps/scope created
    service/scope created
    configmap/scope created
  3. Note that the job webhook-cert-setup fails with the following error:
    creating csr: scope.scope-rc
    error: the server doesn't have a resource type "certificatesigningrequests"
    error: unable to recognize "STDIN": no matches for kind "CertificateSigningRequest" in version "certificates.k8s.io/v1beta1"

Environment

- AppScope: 1.3.0
- OS: 20.04.5 LTS (Focal Fossa)
- Architecture: x86_64
- Kernel: 5.4.0-122-generic

Requested priority

Medium

Relevant log output

No response

ricksalsa commented 1 year ago

The original k8s-webhook-cert-manager project has been archived. Forked here: https://github.com/criblio/k8s-webhook-cert-manager.

The CSR is no longer accepted by the Certificates API and needs to be updated to certificates.k8s.io/v1.

ricksalsa commented 1 year ago

Being addressed in https://github.com/criblio/k8s-webhook-cert-manager/pull/1.

michalbiesek commented 1 year ago

~~After applying proper changes from https://github.com/criblio/k8s-webhook-cert-manager/pull/1 and modified changes from #1366 I see that: webhook-cert-setup fails with the following error Locally I see following issue now:~~

Error from server (Forbidden): certificatesigningrequests.certificates.k8s.io "scope.default" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/kubelet-serving"

Edit: Fixed by e6055dd

michalbiesek commented 1 year ago

Ok the issue with webhook-cert-manager will be solved by the commits above. The additional work which I plan to do in this issue is solve the problem with using scope k8s for Alpine container.

Ok currently I am able to reproduce the issue with Alpine, long story short our model for patching does not work here. Assuming using changes from https://github.com/criblio/k8s-webhook-cert-manager/pull/1 and modified changes from https://github.com/criblio/appscope/pull/1366: The easiest reproduction of the problem is modification of make k8s-test:

docker tag cribl/scope:dev-x86_64 cribl/scope:$(VERSION)
kind delete cluster
kind create cluster
kind load docker-image cribl/scope:$(VERSION)
kubectl create namespace test
kubectl create namespace scope
docker run -it cribl/scope:$(VERSION) scope k8s -m /tmp/metrics.log -e /tmp/events.log --namespace scope --debug | kubectl apply -f -
kubectl label namespace test scope=enabled
kubectl wait --for=condition=available deployment/scope -n scope
kubectl run ubuntu --image=ubuntu:20.04 -n test --restart=Never --command -- sleep infinity
kubectl run alpine --image=alpine:3.16.4 -n test --restart=Never --command -- sleep infinity

See that alpine container will not work. Root cause is that scope extract is performed in context of scope container (container based on Ubuntu 18.04 and glibc loader). Extract will be done to the shared volume which will be later accessible by the alpine container. Therefore every process in alpine container will use the unpatched library with LD_LIBRARY_PATH. There will be additional in webhook.go:

michalbiesek commented 1 year ago

PostStart will rather not work:

There is no guarantee, however, that the postStart handler is called before the Container's entrypoint is called
michalbiesek commented 1 year ago

With @jrcheli advice (thanks for the help) I am exploring the possibility of determination of patching or not using additional init container which will use same image as application container. Therefore we should be able to have proper library in place. Done in 9b4c51e

michalbiesek commented 1 year ago

I use https://github.com/criblio/scope-k8s-demo to test the scope k8s fixes. I do not observe issue with alpine patching anymore. On the setup above I see though one additional error, which I think is related with using scope in SCOPE_EXEC

could not create path to log file /usr/share/grafana/.scope/scope.log: mkdir /usr/share/grafana/.scope: permission denied

Log from Grafana, see that HOME directory is replaced by path /usr/share/grafana

/scope/scope /usr/share/grafana/bin/grafana server --homepath=/usr/share/grafana --config=/etc/grafana/grafana.ini --packaging=docker cfg:default.log.mode=console cfg:default.paths.data=/var/lib/grafana/ cfg:default.paths.logs=/var/log/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning
Env used KUBERNETES_SERVICE_PORT_HTTPS=443 ELASTICSEARCH_PORT_9200_TCP_PORT=9200 GRAFANA_SERVICE_HOST=10.96.131.189 KUBERNETES_SERVICE_PORT=443 CRIBL_INTERNAL_PORT_10090_TCP_ADDR=10.96.52.157 ELASTICSEARCH_PORT_9200_TCP_ADDR=10.96.46.239 SCOPE_TAG_name=grafana SCOPE_GO_HTTP1=false GRAFANA_PORT_80_TCP=tcp://10.96.131.189:80 ELASTICSEARCH_SERVICE_HOST=10.96.46.239 CRIBL_INTERNAL_PORT_9000_TCP_ADDR=10.96.52.157 KIBANA_SERVICE_HOST=10.96.123.99 HOSTNAME=grafana-595f9ff984-tk4f4 SCOPE_TAG_node_name=scope-k8s-demo-control-plane CRIBL_PORT_9000_TCP=tcp://10.96.212.172:9000 GRAFANA_PORT=tcp://10.96.131.189:80 CRIBL_PORT_9000_TCP_PROTO=tcp SCOPE_SERVICE_HOST=10.96.153.145 ELASTICSEARCH_PORT=tcp://10.96.46.239:9200 CRIBL_INTERNAL_SERVICE_PORT=9000 KIBANA_PORT_5601_TCP=tcp://10.96.123.99:5601 KIBANA_SERVICE_PORT_HTTP=5601 CRIBL_SERVICE_PORT_API=9000 KIBANA_PORT_5601_TCP_PORT=5601 PWD=/usr/share/grafana APISERVER_SERVICE_PORT_4000=4000 APISERVER_PORT_4000_TCP=tcp://10.96.98.254:4000 KIBANA_SERVICE_PORT=5601 ELASTICSEARCH_PORT_9200_TCP=tcp://10.96.46.239:9200 CRIBL_INTERNAL_PORT_9000_TCP=tcp://10.96.52.157:9000 GRAFANA_SERVICE_PORT_SERVICE=80 GF_PATHS_HOME=/usr/share/grafana PROMETHEUS_SERVICE_PORT=9090 ELASTICSEARCH_PORT_9300_TCP_ADDR=10.96.46.239 LD_PRELOAD=/scope/0/libscope.so HOME=/usr/share/grafana KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443 SCOPE_PORT_443_TCP_ADDR=10.96.153.145 APISERVER_PORT=tcp://10.96.98.254:4000 SCOPE_CRIBL=tcp://cribl-internal:10090 CRIBL_INTERNAL_PORT=tcp://10.96.52.157:9000 APISERVER_PORT_4000_TCP_ADDR=10.96.98.254 CRIBL_INTERNAL_SERVICE_HOST=10.96.52.157 KIBANA_PORT_5601_TCP_PROTO=tcp SCOPE_TAG_instance=grafana CRIBL_SERVICE_HOST=10.96.212.172 APISERVER_SERVICE_PORT=4000 CRIBL_INTERNAL_SERVICE_PORT_APPSCOPE=10090 CRIBL_PORT=tcp://10.96.212.172:9000 SCOPE_PORT=tcp://10.96.153.145:443 SCOPE_EVENT_HTTP_HEADER=(?i)Cookie.* PROMETHEUS_PORT=tcp://10.96.245.233:9090 SCOPE_PORT_443_TCP_PROTO=tcp ELASTICSEARCH_PORT_9300_TCP_PORT=9300 APISERVER_PORT_4000_TCP_PROTO=tcp KIBANA_PORT_5601_TCP_ADDR=10.96.123.99 ELASTICSEARCH_SERVICE_PORT_HTTP=9200 PROMETHEUS_PORT_9090_TCP_PROTO=tcp GRAFANA_PORT_80_TCP_PROTO=tcp ELASTICSEARCH_SERVICE_PORT_TRANSPORT=9300 SCOPE_PORT_443_TCP=tcp://10.96.153.145:443 SCOPE_TAG_namespace=default SCOPE_SERVICE_PORT_4443_TCP=443 SCOPE_PORT_443_TCP_PORT=443 PROMETHEUS_PORT_9090_TCP_ADDR=10.96.245.233 CRIBL_INTERNAL_SERVICE_PORT_API=9000 SHLVL=0 CRIBL_INTERNAL_PORT_10090_TCP=tcp://10.96.52.157:10090 GF_PATHS_PROVISIONING=/etc/grafana/provisioning ELASTICSEARCH_PORT_9300_TCP=tcp://10.96.46.239:9300 CRIBL_PORT_9000_TCP_PORT=9000 GRAFANA_PORT_80_TCP_PORT=80 ELASTICSEARCH_PORT_9300_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PROTO=tcp GRAFANA_PORT_80_TCP_ADDR=10.96.131.189 CRIBL_INTERNAL_PORT_10090_TCP_PORT=10090 KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1 SCOPE_SERVICE_PORT=443 POD_IP=10.244.0.11 PROMETHEUS_PORT_9090_TCP=tcp://10.96.245.233:9090 GF_SECURITY_ADMIN_PASSWORD=scopedemo GF_SECURITY_ADMIN_USER=admin LD_LIBRARY_PATH=/tmp/appscope/1.3.3/ APISERVER_SERVICE_HOST=10.96.98.254 GRAFANA_SERVICE_PORT=80 KIBANA_PORT=tcp://10.96.123.99:5601 APISERVER_PORT_4000_TCP_PORT=4000 SCOPE_CONF_PATH=/scope/0/scope.yml GF_PATHS_DATA=/var/lib/grafana/ PROMETHEUS_SERVICE_PORT_9090=9090 KUBERNETES_SERVICE_HOST=10.96.0.1 SCOPE_EXEC_PATH=/scope/scope KUBERNETES_PORT=tcp://10.96.0.1:443 PROMETHEUS_PORT_9090_TCP_PORT=9090 KUBERNETES_PORT_443_TCP_PORT=443 ELASTICSEARCH_SERVICE_PORT=9200 PROMETHEUS_SERVICE_HOST=10.96.245.233 ELASTICSEARCH_PORT_9200_TCP_PROTO=tcp GF_PATHS_LOGS=/var/log/grafana CRIBL_SERVICE_PORT=9000 PATH=/usr/share/grafana/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin CRIBL_INTERNAL_PORT_9000_TCP_PORT=9000 SCOPE_PID=1 CRIBL_PORT_9000_TCP_ADDR=10.96.212.172 CRIBL_INTERNAL_PORT_10090_TCP_PROTO=tcp CRIBL_INTERNAL_PORT_9000_TCP_PROTO=tcp GF_PATHS_PLUGINS=/var/lib/grafana/plugins GF_PATHS_CONFIG=/etc/grafana/grafana.ini
mai

This is a go static executable setting SCOPE_HOME seems to be reasonable here.

michalbiesek commented 1 year ago

Image which can be used to test is : mbiesekcribl/scope:1.3.1 - k8s environment: https://github.com/criblio/scope-k8s-demo/pull/13

michalbiesek commented 1 year ago

QA Instructions:

The instruction bellow will create the docker image: cribl:scope/dev

make build CMD="make all"
make image

Using the kind as k8s cluster

### Create cluster
kind create cluster

### Load docker image locally (from host into node)
### This step is only requried for local image because
### for official release the image will be available on DockerHub
### and will be pulled.
kind load docker-image cribl/scope:dev

### Run scope k8s to start webhook and k8s server in cluster
docker run -it cribl/scope:dev scope k8s -m /tmp/metrics.log -e /tmp/events.log | kubectl apply -f -
kubectl label namespace default scope=enabled

### After ~10 seconds you can observe state of the pods in the cluster
### Wait ~10 seconds so you can observe state of the pods in the cluster
### Example view: (no error)
### scope-ccbcd588f-85vnp      1/1     Running     0          21s
### webhook-cert-setup-t6zcq   0/1     Completed   0          21s
kubectl get pods

### Run other container (glibc based) which runs never ending process (sleep)
kubectl run ubuntu --image=ubuntu:20.04 --restart=Never --command -- sleep infinity

### Observe that sleep is scoped
sudo scope ps

### Destroy cluster
kind delete cluster

Using the minikube as k8s cluster

### Create cluster
minikube start

### Load docker image locally (from host into node)
minikube image load cribl/scope:dev

### Create k8s deployment
docker run -it cribl/scope:dev scope k8s -c tcp://<cloud_instance_name>.cribl.cloud:10091  | kubectl apply -f -
kubectl label namespace default scope=enabled

### Wait ~10 seconds so you can observe state of the pods in the cluster
### Example view: (no error)
### scope-ccbcd588f-85vnp      1/1     Running     0          21s
### webhook-cert-setup-t6zcq   0/1     Completed   0          21s
kubectl get pods

### Start Redis container (musl based)
kubectl run redis --image=redis:alpine

### You can go to Stream instance to observe that data is flowing there and `in_appscope_tcp` source

### Destroy cluster
minikube delete