0004 - Githubissues

AmitKumarDas commented 3 years ago

// tags: aws, network, eni, cni
//
// https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md
// https://github.com/aws/amazon-vpc-cni-k8s

AmitKumarDas commented 3 years ago

// tags: anti patterns, k8s, deployment, terraform, kubectl, helm, workstation
//
// https://codefresh.io/kubernetes-tutorial/kubernetes-antipatterns-1/

AmitKumarDas commented 3 years ago

// tags: troubleshooting, runbook, dns, 5xx errors, coredns, policy, config, ndots
//
// [til]
// --Cache Hit percentage: Percentage of requests responded using CoreDNS cache
// --DNS requests latency
//   --CoreDNS: Time taken by CoreDNS to process DNS request
//   --Upstream server: Time taken to process DNS request forwarded to upstream
// --Number of requests forwarded to upstream servers
// --CoreDNS resource usage: Different resources consumed by server such as memory, CPU etc.
//
// [til]
// Error codes for requests:
// --NXDomain: Non-Existent Domain
// --FormErr: Format Error in DNS request
// --ServFail: Server Failure
// --NoError: No Error, successfully processed request
//
// https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
// https://www.infracloud.io/blogs/using-coredns-effectively-kubernetes/
// https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/

AmitKumarDas commented 3 years ago

// tags: kubelet garbage collection for container images
//
// https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/
// https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/

AmitKumarDas commented 3 years ago

// til: kube-proxy, istio, firewall, user space, kernel, cni, 
//
// https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/

AmitKumarDas commented 3 years ago

# tags: monitor kube-proxy
#
$ curl https://[kube_proxy_ip]:10249/metrics
...
# HELP rest_client_request_duration_seconds Request latency in seconds. Broken down by verb and URL.
# TYPE rest_client_request_duration_seconds histogram
rest_client_request_duration_seconds_bucket{url="/https://XXXX%7Bprefix%7D",verb="GET",le="0.001"} 41
rest_client_request_duration_seconds_bucket{url="https://XXXX/%7Bprefix%7D",verb="GET",le="0.002"} 88
rest_client_request_duration_seconds_bucket{url="https://XXXX/%7Bprefix%7D",verb="GET",le="0.004"} 89
rest_client_request_duration_seconds_count{url="https://XXXX/%7Bprefix%7D",verb="POST"} 7
# HELP rest_client_request_latency_seconds (Deprecated) Request latency in seconds. Broken down by verb and URL.
# TYPE rest_client_request_latency_seconds histogram
rest_client_request_latency_seconds_bucket{url="https://XXXX/%7Bprefix%7D",verb="GET",le="0.001"} 41
...
rest_client_request_latency_seconds_sum{url="https://XXXX/%7Bprefix%7D",verb="POST"} 0.0122645
rest_client_request_latency_seconds_count{url="https://XXXX/%7Bprefix%7D",verb="POST"} 7
# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
rest_client_requests_total{code="200",host="XXXX",method="GET"} 26495
rest_client_requests_total{code="201",host="XXXX",method="POST"} 1
rest_client_requests_total{code="<error>",host="XXXX",method="GET"} 91
...

// til: learn the golden signals
//
// https://sysdig.com/blog/monitor-kube-proxy/

# If you want to configure a Prometheus server to scrape kube-proxy, 
# just add the next job to your scrape configuration:
# Remember to customize it with your own labels and relabeling configuration as needed
- job_name: kube-proxy
 honor_labels: true
 kubernetes_sd_configs:
 - role: pod
 relabel_configs:
 - action: keep
   source_labels:
   - __meta_kubernetes_namespace
   - __meta_kubernetes_pod_name
   separator: '/'
   regex: 'kube-system/kube-proxy.+'
 - source_labels:
   - __address__
   action: replace
   target_label: __address__
   regex: (.+?)(\\:\\d+)?
   replacement: $1:10249

AmitKumarDas commented 3 years ago

// til: auth, oidc, ca bundle, proxy, 
//
// https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/

AmitKumarDas commented 3 years ago

// til: monitoring, scraping, e2e, testing, 
// til: spdy, portforward, clientcmd
//
// https://github.com/kubernetes-sigs/metrics-server
// https://github.com/kubernetes-sigs/metrics-server/blob/master/test/e2e_test.go

AmitKumarDas commented 3 years ago

// til: kustomize, controller, flux v2
//
// https://blog.baeke.info/2020/11/10/an-introduction-to-flux-v2/

AmitKumarDas commented 3 years ago

// tags: fellow repos on k8s, watch out, future
//
// https://github.com/jetstack/
// https://github.com/zalando-incubator
// https://github.com/crossplane-contrib

// tags: fellow repos on terraform
//
// https://github.com/jetstack/terraform-google-gke-cluster
// https://github.com/kbst/terraform-kubestack

AmitKumarDas commented 3 years ago

// tags: aws iam on k8s
//
// https://github.com/uswitch/kiam
// https://github.com/jtblin/kube2iam

AmitKumarDas commented 3 years ago

// tags: linkerd proxy aka data plane
//
// What are these proxies? 
// --Layer 7-aware TCP proxies, just like haproxy and NGINX
// 
// What do these proxies do? 
// --proxy calls to and from the services. 
// --act as both “proxies” and “reverse proxies”, 
// --handle both incoming and outgoing calls. 
//
// Understand the traffic specifics between services:
// --traffic between services is what differentiates service mesh proxies
// --e.g. API gateways vs ingress proxies, 
// --both of above focus on traffic from outside world into the cluster
// 
// https://buoyant.io/service-mesh-manifesto/

// tags: linkerd control plane is simpler
//
// enables the data plane needs to act in a coordinated fashion
// --including service discovery, 
// --TLS certificate issuing, 
// --metrics aggregation, and so on. 
//
// The data plane calls the control plane to inform its behavior; 
// the control plane in turn provides an API to allow the user to modify
// and inspect the behavior of the data plane as a whole.

AmitKumarDas commented 3 years ago

// why service mesh
//
// the operational cost of deploying these proxies can be greatly reduced, 
// thanks to some other changes that are happening in the ecosystem
//
// The more important answer is because this design is actually a great 
// way to introduce additional logic into the system. That’s not only because
// there are a ton of features you can add right there, but also because 
// you can add them without changing the ecosystem. In fact, the entire 
// service mesh model is predicated on this very insight: that, in a 
// multi-service system, regardless of what individual services actually do, 
// the traffic between them is an ideal insertion point for functionality.
//
// For example, Linkerd, like most meshes, has a Layer 7 feature set focused
// primarily on HTTP calls, including HTTP/2 and gRPC.1 The feature set is 
// broad, but can be divided into three classes:
//
// Reliability features:
// --Request retries, 
// --timeouts, 
// --canaries (traffic splitting/shifting), etc.
//
// Observability features:
// --Aggregation of success rates, latencies, and request volumes for each service, 
// or individual routes; 
// --drawing of service topology maps; etc.
// --Security features. Mutual TLS, access control, etc.

AmitKumarDas commented 3 years ago

// Many of these features operate at the request level (hence the “L7 proxy”).
// For example, if service Foo makes an HTTP call to service Bar, the 
// linkerd-proxy on Foo’s side can load balance that call intelligently across all
// the instances of Bar based on the observed latency of each one; it can retry
// the request if it fails and if it’s idempotent; it can record the response code 
// and latency; and so on. Similarly, the linkerd-proxy on Bar’s side can reject 
// the call if it’s not allowed, or is over the rate limit; it can record latency from 
// its perspective; and so on.
//
// The proxies can “do stuff” at the connection level too. For example, Foo’s 
// linkerd-proxy can initiate a TLS connection and Bar’s linkerd-proxy can 
// terminate it, and both sides can validate the others’ TLS certificate.2 This 
// provides not just encryption between services, but a cryptographically 
// secure form of service identity—Foo and Bar can “prove” they are who 
// they say they are.
//
// Whether they’re at the request or at the connection level, one important 
// thing to note is that the features of the service mesh are all operational
// in nature. There isn’t anything in Linkerd about transforming the semantics
// of the request payload, e.g. adding fields to a JSON blob or transforming 
// a protobuf. This is an important distinction that touch on again when we 
// talk about ESBs and middleware.

AmitKumarDas / fun-with-programming

0004 #82