Tetragon gRPC API returns "error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" command terminated with exit code 1"

ashishkurmi commented 1 year ago

What happened?

Bug Description

When retrieving Tetragon events from the gRPC endpoint using tetra CLI, tetra CLI breaks occasionally with the following error message:

kubectl exec -it -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact -n default

time="xxx" level=fatal msg="Failed to receive events" error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" command terminated with exit code 1

For example: :boom: exit default/dind /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem 253 :electric_plug: connect default/dind /usr/local/bin/dockerd tcp 10.0.5.148:42774 -> 34.205.13.154:443 :rocket: process default/dind /usr/local/bin/runc --log /var/lib/docker/buildkit/executor/runc-log.json --log-format json run --bundle /var/lib/docker/buildkit/executor/tyvqwqdjam33m7564no4hhirw tyvqwqdjam33m7564no4hhirw time="2023-07-25T00:40:17Z" level=fatal msg="Failed to receive events" error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" command terminated with exit code 1

Repro Steps

Please follow these steps to repro the error scenario. These steps work for me on AWS EKS.

Configure Tetragon and start listening for events

helm repo add cilium https://helm.cilium.io/
helm install tetragon cilium/tetragon --version 0.10.0 -n kube-system
kubectl rollout status -n kube-system ds/tetragon -w
kubectl create -f https://raw.githubusercontent.com/ashishkurmi/golang-container-demo/main/tcp_connect.yaml
kubectl exec -it -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact -n default

Generate Tetragon error events

kubectl create -f https://raw.githubusercontent.com/ashishkurmi/golang-container-demo/main/pod.yaml
kubectl exec --stdin --tty dind -- /bin/sh
apk update && apk add git
git clone https://github.com/ashishkurmi/golang-container-demo.git
cd golang-container-demo
docker build .

You can run docker build . multiple times in the pod. For me, it consistently generates the error scenario.

Tetragon Version

0.10.0

Kernel Version

Linux ip-10-0-51-149.us-west-2.compute.internal 5.10.184-175.731.amzn2.x86_64 #1 SMP Tue Jun 27 21:48:55 UTC 2023 x86_64 Linux

Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.3-eks-a5565ad", GitCommit:"78c8293d1c65e8a153bf3c03802ab9358c0e1a14", GitTreeState:"clean", BuildDate:"2023-06-16T17:32:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}

Bugtool

No response

Relevant log output

Output of tetra CLI
 exec -it -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact -n default

:boom: exit    default/dind /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem 253
:electric_plug: connect default/dind /usr/local/bin/dockerd tcp 10.0.5.148:42774 -> 34.205.13.154:443
:rocket: process default/dind /usr/local/bin/runc --log /var/lib/docker/buildkit/executor/runc-log.json --log-format json run --bundle /var/lib/docker/buildkit/executor/tyvqwqdjam33m7564no4hhirw tyvqwqdjam33m7564no4hhirw
time="2023-07-25T00:40:17Z" level=fatal msg="Failed to receive events" error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8"
command terminated with exit code 1

Anything else?

No response

kkourt commented 1 year ago

Hi,

Based on the issue description, I believe this might have been fixed by https://github.com/cilium/tetragon/pull/1282. Could you try the latest image (quay.io/cilium/tetragon-ci:latest) and see if it fixes the issue?

ashishkurmi commented 1 year ago

Thanks so much @kkourt for fixing this issue, my repro doesn't work with the latest image! I now see non-ASCII characters in the output that I believe were previously causing this issue: 🚀 process default/dind /proc/self/exe init /var ig_map_stats _recursive � �� ȳ�}� � �~�~�@�Q�~�P�Q�~��~�#�~�~� @�?,~�@ h �#1�@�~�

would this fix be merged with the existing Tetragon stable release (v0.10.0)?

kkourt commented 1 year ago

would this fix be merged with the existing Tetragon stable release (v0.10.0)?

The change was already backported in v0.10: https://github.com/cilium/tetragon/pull/1285, so it will be part of v0.10.1.

cilium / tetragon