go-faster / portoshim

CRI plugin for Porto container runtime
Other
0 stars 0 forks source link

Setup yt cluster in k8s with porto #1

Closed ernado closed 9 months ago

ernado commented 10 months ago

Blockers:

Progress:

Current setup instructions:

1) Use Ubuntu 20.04 for systemd hybrid mode cgroup setup

2) Install porto, use fork:

Additional build deps

sudo apt install -y libtool autoconf

Configuration

cat >/etc/portod.conf.d/k8s.conf << EOF
log {
  verbose: true
  debug: true
}
daemon {
  docker_images_support: true
}
container {
  enable_systemd: true
  detect_systemd: true
  propagate_cpu_guarantee: true
  enable_blkio: true
  enable_cgroup2: true
  use_os_mode_cgroupns: true
  enable_docker_mode: true
  enable_rw_cgroupfs: true
  enable_numa_migration: true
  enable_rw_net_cgroups: true
  cpu_limit_scale: 1
  proportional_cpu_shares: false
  memory_high_limit_proportion: 0
  enable_sched_idle: true
}
EOF

3) Install portoshim

Also add crictl config:

cat >/etc/crictl.yaml << EOF
runtime-endpoint: unix:///run/portoshim.sock
EOF

Setup binaries:

make
sudo make install

4) Install kubeadm and tools 5) Setup bridge CNI

mkdir -p /etc/cni/net.d /etc/cni/net.d/

Install cni plugins:

git clone https://github.com/containernetworking/plugins
cd plugins
./build_linux.sh
sudo mkdir -p /opt/cni/bin
sudo cp ./bin/* /opt/cni/bin/

Configure CNI:

cat >/etc/cni/net.d/10-porto.conflist << EOF
{
  "cniVersion": "1.0.0",
  "name": "porto",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "isGateway": true,
      "ipMasq": true,
      "hairpinMode": true,
      "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ],
        "ranges": [
            [{ "subnet": "10.85.0.0/16" }]
        ]
      }
    }
  ]
}
EOF

6) Setup network options

modprobe br_netfilter
echo br_netfilter > /etc/modules-load.d/br_netfilter.conf
cat >/etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward=1
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1

net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl -p

7) Disable swap 8) Init cluster

portoctl docker-pull registry.k8s.io/pause:3.7

export CRI_ENDPOINT=unix:///run/portoshim.sock
kubeadm init --cri-socket=$CRI_ENDPOINT
kubectl taint nodes $(hostname) node-role.kubernetes.io/control-plane:NoSchedule-

9) Setup cilium

Install cli:

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

Install Cilium:

kubectl create ns cilium
cilium install --version 1.14.5 -n cilium --set bpf.autoMount.enabled=false

Check:

kubectl -n cilium get pods
kubectl -n cilium exec -i -t -c cilium-agent daemonset/cilium -- cilium endpoint list

10) Install CRI (openebs)

helm repo add openebs https://openebs.github.io/charts
helm repo update
helm install openebs --namespace openebs openebs/openebs --create-namespace

Add default storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-hostpath
  annotations:
    openebs.io/cas-type: local
    storageclass.kubernetes.io/is-default-class: "true"
    cas.openebs.io/config: |
      - name: StorageType
        value: hostpath
      - name: BasePath
        value: /var/openebs
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: openebs-hostpath-pvc
spec:
  storageClassName: openebs-hostpath
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5G

Check:

---
apiVersion: v1
kind: Pod
metadata:
  name: hello-local-hostpath-pod
spec:
  volumes:
  - name: local-storage
    persistentVolumeClaim:
      claimName: openebs-hostpath-pvc
  containers:
  - name: hello-container
    image: busybox
    command:
       - sh
       - -c
       - 'while true; do echo "`date` [`hostname`] Hello from OpenEBS Local PV." >> /mnt/store/greet.txt; sleep $(($RANDOM % 5 + 300)); done'
    volumeMounts:
    - mountPath: /mnt/store
      name: local-storage
kubectl describe pod hello-local-hostpath-pod 
kubectl get pv

11) Install ytsaurus operator

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.3/cert-manager.yaml
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl -n cert-manager rollout status --timeout=1m deployment cert-manager-webhook

helm pull oci://docker.io/ytsaurus/ytop-chart --version 0.4.1 --untar
helm upgrade --install --namespace ytop --create-namespace  ytsaurus ytop-chart/
kubectl -n ytop rollout status --timeout=1m deployment ytsaurus-ytop-chart-controller-manager

12) Setup ytsaurus cluster

As per minikube instructions:

wget https://github.com/ytsaurus/yt-k8s-operator/blob/main/config/samples/0.4.0/cluster_v1_minikube.yaml
kubectl create ns yt
kubectl apply -n yt -f cluster_v1_minikube.yaml

Minikube

git clone https://github.com/go-faster/minikube.git
cd minikube
make out/minikube
wget -O /tmp/minikube.iso https://github.com/go-faster/minikube/releases/download/v1.32.1-alpha.0/minikube-amd64.iso
./out/minikube start --iso-url=file:///tmp/minikube.iso --cni=cilium --container-runtime=porto --cache-images=false 
ernado commented 10 months ago

~Following error occurs while trying to initialize cluster:~

E1225 10:00:24.519203   76288 remote_runtime.go:319] "CreateContainer in sandbox from runtime service failed" err="rpc error: code = Unknown desc = main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist" podSandboxID="kube-apiserver-vm20-16e0"
E1225 10:00:24.519429   76288 kuberuntime_manager.go:1262] container &Container{Name:kube-apiserver,Image:registry.k8s.io/kube-apiserver:v1.29.0,Command:[kube-apiserver --advertise-address=192.168.114.132 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{250 -3} {<nil>} 250m DecimalSI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:ca-certs,ReadOnly:true,MountPath:/etc/ssl/certs,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:etc-ca-certificates,ReadOnly:true,MountPath:/etc/ca-certificates,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:etc-pki,ReadOnly:true,MountPath:/etc/pki,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:k8s-certs,ReadOnly:true,MountPath:/etc/kubernetes/pki,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:usr-local-share-ca-certificates,ReadOnly:true,MountPath:/usr/local/share/ca-certificates,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:usr-share-ca-certificates,ReadOnly:true,MountPath:/usr/share/ca-certificates,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/livez,Port:{0 6443 },Host:192.168.114.132,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:10,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,TerminationGracePeriodSeconds:nil,},ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/readyz,Port:{0 6443 },Host:192.168.114.132,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:15,PeriodSeconds:1,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/livez,Port:{0 6443 },Host:192.168.114.132,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:10,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:24,TerminationGracePeriodSeconds:nil,},ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod kube-apiserver-vm20_kube-system(76043a095a9d6861fe95dda2dff4f0e3): CreateContainerError: main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist
E1225 10:00:24.519464   76288 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CreateContainerError: \"main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist\"" pod="kube-system/kube-apiserver-vm20" podUID="76043a095a9d6861fe95dda2dff4f0e3"
E1225 10:00:29.523461   76288 remote_runtime.go:319] "CreateContainer in sandbox from runtime service failed" err="rpc error: code = Unknown desc = main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist" podSandboxID="kube-controller-manager-vm20-a54a"
2023-12-25T10:13:33.500Z    INFO    [a7c16473] /runtime.v1.RuntimeService/CreateContainer
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] &CreateContainerRequest{PodSandboxId:kube-controller-manager-vm20-a54a,Config:&ContainerConfig{Metadata:&ContainerMetadata{Name:kube-controller-manager,Attempt:0,},Image:&ImageSpec{Image:0824682bcdc8eb985abba5995682c798e1949714458927257f746ef123be4242,Annotations:map[string]string{},},Command:[kube-controller-manager --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --use-service-account-credentials=true],Args:[],WorkingDir:,Envs:[]*KeyValue{},Mounts:[]*Mount{&Mount{ContainerPath:/etc/ssl/certs,HostPath:/etc/ssl/certs,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/etc/ca-certificates,HostPath:/etc/ca-certificates,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/etc/pki,HostPath:/etc/pki,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/usr/libexec/kubernetes/kubelet-plugins/volume/exec,HostPath:/usr/libexec/kubernetes/kubelet-plugins/volume/exec,Readonly:false,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/etc/kubernetes/pki,HostPath:/etc/kubernetes/pki,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/etc/kubernetes/controller-manager.conf,HostPath:/etc/kubernetes/controller-manager.conf,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/usr/local/share/ca-certificates,HostPath:/usr/local/share/ca-certificates,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/usr/share/ca-certificates,HostPath:/usr/share/ca-certificates,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/etc/hosts,HostPath:/var/lib/kubelet/pods/baaddd3f0c443f69b3ee47446ada1e1a/etc-hosts,Readonly:false,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},&Mount{ContainerPath:/dev/termination-log,HostPath:/var/lib/kubelet/pods/baaddd3f0c443f69b3ee47446ada1e1a/containers/kube-controller-manager/aa12ba52,Readonly:false,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,},},Devices:[]*Device{},Labels:map[string]string{io.kubernetes.container.name: kube-controller-manager,io.kubernetes.pod.name: kube-controller-manager-vm20,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: baaddd3f0c443f69b3ee47446ada1e1a,},Annotations:map[string]string{io.kubernetes.container.hash: 5e526ccd,io.kubernetes.container.restartCount: 0,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: File,io.kubernetes.pod.terminationGracePeriod: 30,},LogPath:kube-controller-manager/0.log,Stdin:false,StdinOnce:false,Tty:false,Linux:&LinuxContainerConfig{Resources:&LinuxContainerResources{CpuPeriod:100000,CpuQuota:0,CpuShares:204,MemoryLimitInBytes:0,OomScoreAdj:-997,CpusetCpus:,CpusetMems:,HugepageLimits:[]*HugepageLimit{&HugepageLimit{PageSize:2MB,Limit:0,},&HugepageLimit{PageSize:1GB,Limit:0,},},Unified:map[string]string{},MemorySwapLimitInBytes:0,},SecurityContext:&LinuxContainerSecurityContext{Capabilities:nil,Privileged:false,NamespaceOptions:&NamespaceOption{Network:NODE,Pid:CONTAINER,Ipc:POD,TargetId:,},SelinuxOptions:nil,RunAsUser:&Int64Value{Value:1,},RunAsUsername:,ReadonlyRootfs:false,SupplementalGroups:[],ApparmorProfile:,SeccompProfilePath:,NoNewPrivs:false,RunAsGroup:nil,MaskedPaths:[/proc/asound /proc/acpi /proc/kcore /proc/keys /proc/latency_stats /proc/timer_list /proc/timer_stats /proc/sched_debug /proc/scsi /sys/firmware],ReadonlyPaths:[/proc/bus /proc/fs /proc/irq /proc/sys /proc/sysrq-trigger],Seccomp:&SecurityProfile{ProfileType:RuntimeDefault,LocalhostRef:,},Apparmor:nil,},},Windows:nil,},SandboxConfig:&PodSandboxConfig{Metadata:&PodSandboxMetadata{Name:kube-controller-manager-vm20,Uid:baaddd3f0c443f69b3ee47446ada1e1a,Namespace:kube-system,Attempt:0,},Hostname:,LogDirectory:/var/log/pods/kube-system_kube-controller-manager-vm20_baaddd3f0c443f69b3ee47446ada1e1a,DnsConfig:&DNSConfig{Servers:[192.168.114.2],Searches:[localdomain],Options:[],},PortMappings:[]*PortMapping{},Labels:map[string]string{component: kube-controller-manager,io.kubernetes.pod.name: kube-controller-manager-vm20,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: baaddd3f0c443f69b3ee47446ada1e1a,tier: control-plane,},Annotations:map[string]string{kubernetes.io/config.hash: baaddd3f0c443f69b3ee47446ada1e1a,kubernetes.io/config.seen: 2023-12-25T09:50:17.323943187Z,kubernetes.io/config.source: file,},Linux:&LinuxPodSandboxConfig{CgroupParent:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podbaaddd3f0c443f69b3ee47446ada1e1a.slice,SecurityContext:&LinuxSandboxSecurityContext{NamespaceOptions:&NamespaceOption{Network:NODE,Pid:CONTAINER,Ipc:POD,TargetId:,},SelinuxOptions:nil,RunAsUser:nil,ReadonlyRootfs:false,SupplementalGroups:[],Privileged:false,SeccompProfilePath:,RunAsGroup:nil,Seccomp:&SecurityProfile{ProfileType:RuntimeDefault,LocalhostRef:,},Apparmor:nil,},Sysctls:map[string]string{},Overhead:&LinuxContainerResources{CpuPeriod:0,CpuQuota:0,CpuShares:0,MemoryLimitInBytes:0,OomScoreAdj:0,CpusetCpus:,CpusetMems:,HugepageLimits:[]*HugepageLimit{},Unified:map[string]string{},MemorySwapLimitInBytes:0,},Resources:&LinuxContainerResources{CpuPeriod:100000,CpuQuota:0,CpuShares:204,MemoryLimitInBytes:0,OomScoreAdj:0,CpusetCpus:,CpusetMems:,HugepageLimits:[]*HugepageLimit{},Unified:map[string]string{},MemorySwapLimitInBytes:0,},},Windows:nil,},}
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] check image: 0824682bcdc8eb985abba5995682c798e1949714458927257f746ef123be4242
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] prepare resources: &LinuxContainerResources{CpuPeriod:100000,CpuQuota:0,CpuShares:204,MemoryLimitInBytes:0,OomScoreAdj:-997,CpusetCpus:,CpusetMems:,HugepageLimits:[]*HugepageLimit{&HugepageLimit{PageSize:2MB,Limit:0,},&HugepageLimit{PageSize:1GB,Limit:0,},},Unified:map[string]string{},MemorySwapLimitInBytes:0,}
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] prepare labels and annotations: labels=map[io.kubernetes.container.name:kube-controller-manager io.kubernetes.pod.name:kube-controller-manager-vm20 io.kubernetes.pod.namespace:kube-system io.kubernetes.pod.uid:baaddd3f0c443f69b3ee47446ada1e1a] annotations=map[io.kubernetes.container.hash:5e526ccd io.kubernetes.container.restartCount:0 io.kubernetes.container.terminationMessagePath:/dev/termination-log io.kubernetes.container.terminationMessagePolicy:File io.kubernetes.pod.terminationGracePeriod:30]
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] prepare resolv.conf: &DNSConfig{Servers:[192.168.114.2],Searches:[localdomain],Options:[],}
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] prepare container root: kube-controller-manager-vm20-a54a/kube-controller-manager-1013 0824682bcdc8eb985abba5995682c798e1949714458927257f746ef123be4242
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] prepare container mounts: [&Mount{ContainerPath:/etc/ssl/certs,HostPath:/etc/ssl/certs,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/etc/ca-certificates,HostPath:/etc/ca-certificates,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/etc/pki,HostPath:/etc/pki,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/usr/libexec/kubernetes/kubelet-plugins/volume/exec,HostPath:/usr/libexec/kubernetes/kubelet-plugins/volume/exec,Readonly:false,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/etc/kubernetes/pki,HostPath:/etc/kubernetes/pki,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/etc/kubernetes/controller-manager.conf,HostPath:/etc/kubernetes/controller-manager.conf,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/usr/local/share/ca-certificates,HostPath:/usr/local/share/ca-certificates,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/usr/share/ca-certificates,HostPath:/usr/share/ca-certificates,Readonly:true,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/etc/hosts,HostPath:/var/lib/kubelet/pods/baaddd3f0c443f69b3ee47446ada1e1a/etc-hosts,Readonly:false,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,} &Mount{ContainerPath:/dev/termination-log,HostPath:/var/lib/kubelet/pods/baaddd3f0c443f69b3ee47446ada1e1a/containers/kube-controller-manager/aa12ba52,Readonly:false,SelinuxRelabel:false,Propagation:PROPAGATION_PRIVATE,}]
2023-12-25T10:13:33.501Z    DEBUG   [a7c16473] create container from spec: kube-controller-manager-vm20-a54a/kube-controller-manager-1013
2023-12-25T10:13:33.521Z    DEBUG   [a7c16473] nil
2023-12-25T10:13:33.521Z    WARN    [a7c16473] main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist
2023-12-25T10:13:33.521Z    INFO    [a7c16473] /runtime.v1.RuntimeService/CreateContainer time: 20 ms

~Looks like host path is not created?~

go build ./cmd/logshim
sudo cp logshim /usr/sbin/logshim
ernado commented 10 months ago
  Normal   Scheduled  5m43s                   default-scheduler  Successfully assigned kube-system/coredns-76f75df574-ftwkx to vm20
  Warning  Failed     3m31s (x12 over 5m43s)  kubelet            Error: main.(*PortoshimRuntimeMapper).CreateContainer: LayerNotFound: Layer not found /place/porto_docker/v1/layers/blobs/86/860aeecad37138b467a6c41d5278c647ed77350db18b56364c29a4770e271fca/content
  Normal   Pulled     38s (x25 over 5m43s)    kubelet            Container image "registry.k8s.io/coredns/coredns:v1.11.1" already present on machine

enable_docker_mode: true

If error still occurs, try to create directory via

mkdir -p /place/porto_docker/v1/layers/blobs
ernado commented 10 months ago
root@vm20:~# kubectl -n kube-system logs kube-proxy-vh2kf
I1225 15:40:17.290356       7 server_others.go:72] "Using iptables proxy"
I1225 15:40:17.296458       7 server.go:1050] "Successfully retrieved node IP(s)" IPs=["192.168.114.132"]
I1225 15:40:17.300562       7 conntrack.go:58] "Setting nf_conntrack_max" nfConntrackMax=262144
I1225 15:40:17.300727       7 conntrack.go:118] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_established" value=86400
E1225 15:40:17.300759       7 server.go:556] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established: read-only file system"
E1225 15:40:17.300770       7 run.go:74] "command failed" err="open /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established: read-only file system"

Related to kube-proxy trying to set: https://github.com/kubernetes-sigs/kind/pull/2241

Currently fixed in porto fork.

ernado commented 10 months ago

Node exporter not starting:

level=fatal msg="Couldn't create metrics handler: couldn't create collector: failed to open sysfs: could not read /host/sys: stat /host/sys: no such file or directory" source="node_exporter.go:55"

Manifest:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: node-exporter
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      labels:
        app: node-exporter
    spec:
      containers:
        - args:
            - --web.listen-address=0.0.0.0:9100
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
          image: quay.io/prometheus/node-exporter:v0.18.1
          imagePullPolicy: IfNotPresent
          name: node-exporter
          ports:
            - containerPort: 9100
              hostPort: 9100
              name: metrics
              protocol: TCP
          resources:
            limits:
              cpu: 200m
              memory: 50Mi
            requests:
              cpu: 100m
              memory: 30Mi
          volumeMounts:
            - mountPath: /host/proc
              name: proc
              readOnly: true
            - mountPath: /host/sys
              name: sys
              readOnly: true
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      tolerations:
        - effect: NoSchedule
          operator: Exists
        - effect: NoExecute
          operator: Exists
      volumes:
        - hostPath:
            path: /proc
            type: ""
          name: proc
        - hostPath:
            path: /sys
            type: ""
          name: sys
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: node-exporter
  name: node-exporter
  namespace: monitoring
spec:
  ports:
    - name: node-exporter
      port: 9100
      protocol: TCP
      targetPort: 9100
  selector:
    app: node-exporter
  sessionAffinity: None
  type: ClusterIP
ernado commented 10 months ago

Ref: https://github.com/ten-nancy/porto/issues/6

ernado commented 10 months ago

Trying to install openebs:

helm repo add openebs https://openebs.github.io/charts
helm repo update
helm install openebs --namespace openebs openebs/openebs --create-namespace

For single node cluster, also remove control plane taint:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Checking:

$ kubectl get pods -n openebs
NAME                                           READY   STATUS                 RESTARTS      AGE
openebs-localpv-provisioner-56d6489bbc-ssct2   1/1     Running                1 (42s ago)   5m3s
openebs-ndm-operator-5d7944c94d-2p97m          1/1     Running                0             5m3s
openebs-ndm-wwxdk                              0/1     CreateContainerError   0             3m42s

The ndm pod failed to start:

$ kubectl -n openebs describe pod openebs-ndm-wwxdk
Name:             openebs-ndm-wwxdk
Namespace:        openebs
Priority:         0
Service Account:  openebs
Node:             vm20/192.168.114.132
Start Time:       Tue, 26 Dec 2023 18:09:38 +0000
Labels:           app=openebs
                  component=ndm
                  controller-revision-hash=564ff4c86d
                  name=openebs-ndm
                  openebs.io/component-name=ndm
                  openebs.io/version=3.10.0
                  pod-template-generation=1
                  release=openebs
Annotations:      <none>
Status:           Pending
IP:               192.168.114.132
IPs:
  IP:           192.168.114.132
Controlled By:  DaemonSet/openebs-ndm
Containers:
  openebs-ndm:
    Container ID:  
    Image:         openebs/node-disk-manager:2.1.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      -v=4
      --feature-gates=GPTBasedUUID
    State:          Waiting
      Reason:       CreateContainerError
    Ready:          False
    Restart Count:  0
    Liveness:       exec [pgrep ndm] delay=30s timeout=1s period=60s #success=1 #failure=3
    Environment:
      NAMESPACE:          openebs (v1:metadata.namespace)
      NODE_NAME:           (v1:spec.nodeName)
      SPARSE_FILE_DIR:    /var/openebs/sparse
      SPARSE_FILE_SIZE:   10737418240
      SPARSE_FILE_COUNT:  0
    Mounts:
      /dev from devmount (rw)
      /host/node-disk-manager.config from config (ro,path="node-disk-manager.config")
      /host/proc from procmount (ro)
      /run/udev from udev (rw)
      /var/openebs/ndm from basepath (rw)
      /var/openebs/sparse from sparsepath (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qgwx7 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      openebs-ndm-config
    Optional:  false
  udev:
    Type:          HostPath (bare host directory volume)
    Path:          /run/udev
    HostPathType:  Directory
  procmount:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
  devmount:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  Directory
  basepath:
    Type:          HostPath (bare host directory volume)
    Path:          /var/openebs/ndm
    HostPathType:  DirectoryOrCreate
  sparsepath:
    Type:          HostPath (bare host directory volume)
    Path:          /var/openebs/sparse
    HostPathType:  
  kube-api-access-qgwx7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  116s                default-scheduler  Successfully assigned openebs/openebs-ndm-wwxdk to vm20
  Normal   Pulling    116s                kubelet            Pulling image "openebs/node-disk-manager:2.1.0"
  Normal   Pulled     110s                kubelet            Successfully pulled image "openebs/node-disk-manager:2.1.0" in 5.653s (5.654s including waiting)
  Warning  Failed     8s (x10 over 110s)  kubelet            Error: main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist
  Normal   Pulled     8s (x9 over 109s)   kubelet            Container image "openebs/node-disk-manager:2.1.0" already present on machine

Having Error: main.(*PortoshimRuntimeMapper).CreateContainer: InvalidPath: Storage path does not exist again

UPD(tdakkota): Doing

sudo mkdir -p /var/openebs/sparse

fixes the problem

UPD2 (tdakkota):

Patch https://github.com/go-faster/porto/commit/4b167d932aef5415743b29415f51b5b856334169 should resolve the problem, porto creates binded directory automatically now.

ernado commented 10 months ago

@tdakkota next step is to set up ytsaurus cluster

ernado commented 10 months ago
  Warning  Failed     6m11s (x4 over 7m36s)   kubelet            Failed to pull image "busybox": main.(*PortoshimImageMapper).PullImage: Docker: Unknown manifest mediaType: application/vnd.oci.image.index.v1+json
  Warning  Failed     6m11s (x4 over 7m36s)   kubelet            Error: ErrImagePull
  Warning  Failed     5m45s (x6 over 7m35s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m33s (x20 over 7m35s)  kubelet            Back-off pulling image "busybox"

Porto does not support OCI?

ernado commented 10 months ago
kubectl apply -f https://raw.githubusercontent.com/ytsaurus/yt-k8s-operator/main/config/samples/0.4.0/cluster_v1_minikube.yaml
$ kubectl get pods
NAME                                              READY   STATUS    RESTARTS        AGE
ca-0                                              1/1     Running   0               21m
dnd-0                                             1/1     Running   0               21m
dnd-1                                             1/1     Running   0               21m
dnd-2                                             1/1     Running   0               21m
ds-0                                              1/1     Running   0               23m
end-0                                             1/1     Running   0               21m
hp-0                                              1/1     Running   0               21m
hp-control-0                                      1/1     Running   0               21m
ms-0                                              1/1     Running   0               23m
rp-0                                              1/1     Running   0               21m
rp-heavy-0                                        1/1     Running   0               21m
sch-0                                             1/1     Running   0               21m
yt-strawberry-controller-init-job-cluster-nz2fz   1/1     Running   4 (3m18s ago)   20m
ytsaurus-ui-deployment-78f7d86897-nl4ph           1/1     Running   0               20m

Now we need to enable porto runtime and pass porto socket into yt.

ernado commented 10 months ago

Now we need to make cilium work.

ernado commented 10 months ago

cilium failure (cilium-agent container):

level=fatal msg="clang version: NOT OK" error="fork/exec /usr/local/bin/clang: no such file or directory" subsys=linux-datapath
dpoluyanov commented 10 months ago

cilium failure (cilium-agent container):

level=fatal msg="clang version: NOT OK" error="fork/exec /usr/local/bin/clang: no such file or directory" subsys=linux-datapath

Related to https://github.com/ten-nancy/porto/issues/6

ernado commented 10 months ago
Error: main.(*PortoshimRuntimeMapper).StartContainer: Unknown: File exists: mkdir(run/lock, 01777)

on https://github.com/go-faster/porto/commit/87fd30aab9ce463a798059a759b9c7726fc810b5

ernado commented 10 months ago

Now we need to:

1) Setup job with layer specified 2) Check that job has network connectivity 3) Check that (2) is showing in cilium

ernado commented 9 months ago

Should be mostly done.

Deploy and auxiliary scripts are moving to https://github.com/go-faster/ytst.

Issues extracted to #2, #3, #4.