aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Apache License 2.0
2.26k stars 735 forks source link

CNI log collector script not working with bottlerocket #1316

Open jayanthvn opened 3 years ago

jayanthvn commented 3 years ago

What happened:

There are recent requests mentioning CNI log collector script is not working with bottlerocket.

Attach logs

What you expected to happen: Script should work as expected.

How to reproduce it (as minimally and precisely as possible): Run the log collector script.

Anything else we need to know?:

Environment:

ezzoueidi commented 3 years ago

working on this. This issue sounds like need to be addressed in aws/amazon-eks-ami repository.

nmour77 commented 3 years ago

NOTE: A manual procedure to collect logs is provided in this comment. We still need help building a script which includes these steps.

Bottlerocket OS comes with limited packages installed and there is no direct SSH access to the host. We need to ssh into the admin container and then drop into the host's root filesystem. I provided the manual procedure to do this and collect the below logs:

Procedure to collect logs:

SSH into the worker node

ssh -i "KEY-File" ec2-user@<IP-ADDRESS-OF-THE-WORKER-NODE>

Make a directory to store the logs and we will create a tar ball of this directory in the later steps.

mkdir /.bottlerocket/rootfs/tmp/ekslogs

Switch to root user to install tar package to create a tar ball in the later steps

sudo su

Install tar package.

yum install -y tar

Exit out of the root user shell to move back to ec2-user shell

exit

Drop into root shell in the Bottlerocket host's root filesystem.

sudo sheltie

copy ipamd.log and plugin.log to /.bottlerocket/rootfs/tmp/ekslogs directory

cp /var/log/aws-routed-eni/* /tmp/ekslogs/

Run the below command only if you want to collect all the container logs. This is optional since you can get the container logs from the "kubectl logs -c " command as well. Please be aware that running the below command also requires free space on the volumes attached, hence please be careful while running the below command and run it only if necessary.

cp /var/log/containers/* /tmp/ekslogs/

Change the ownership of the files under /.bottlerocket/rootfs/tmp/ekslogs to the owner of the directory /.bottlerocket/rootfs/tmp/ekslogs. To check ownership of the directory /tmp/ekslogs/ run the below command. In the below example, the user owner and group owner of the directory is 1000 hence we will switch the ownership of all files under /tmp/ekslogs/ to the 1000:1000.

ls -ld /tmp/ekslogs/ drwxrwxr-x. 3 1000 1000 200 Feb 9 03:10 /tmp/ekslogs/

command to change the ownership of the directory /tmp/ekslogs/

chown -R 1000:1000 /tmp/ekslogs/

Run logdog to create a tar ball which contains kubelet logs and many more details

logdog

NOTE: Logdog uses the configuration file logdog.common.conf at https://github.com/bottlerocket-os/bottlerocket/blob/8f731fb322c1f80bf84962c6a697e86110c17bdc/sources/logdog/conf/logdog.common.conf to collect logs. As of now this file cannot be modified.

Run the below command to copy the tar ball that is created in the above step to /tmp/ekslogs which will be translated to /.bottlerocket/rootfs/tmp/ekslogs when you switch to ec2-user later.

cp /tmp/bottlerocket* /tmp/ekslogs

Exit out of the root shell in the Bottlerocket host's root filesystem.

exit

Navigate to "/.bottlerocket/rootfs/tmp/" directory and create a tar ball of the log folder ekslogs under /.bottlerocket/rootfs/tmp/

cd /.bottlerocket/rootfs/tmp/ tar -cvzf full-logs.tar.gz /.bottlerocket/rootfs/tmp/ekslogs

Exit out of the instance and run the below command

ssh -i "KEY-File" ec2-user@<IP-ADDRESS-OF-THE-WORKER-NODE> "cat /.bottlerocket/rootfs/tmp/full-logs.tar.gz" > bottlerocket-logs1.tar.gz

Unzip the log bundle and look for ipamd and plugin logs under /.bottlerocket/rootfs/tmp/ekslogs directory

Unzip the bottlerocket-logs.tar.gz under /.bottlerocket/rootfs/tmp/ekslogs directory to see the logs from host root filesystem. This includes the journal logs, dmesg, iptable rules.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

Issue closed due to inactivity.

orsenthil commented 5 months ago

We need to add these steps to log collector script.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days