Open cknowles opened 7 years ago
Hi @c-knowles,
Sorry for the confusion.
[TL;DR] We do not have full support for Kubernetes log support at container level (aka. the log is associated with a Kubernetes container) in AWS yet. This is a active area of work though.
Now let me try to clarify it a bit.
[How Kubernetes metadata used to be handled in GKE and Kubernetes in GCE]
Kubernetes logs used to be annotated with the kubernetes_metadata_filter
plugin. Each log entry will be enriched with a field kubernetes
.
Example:
"kubernetes": {
"host": "jimmi-redhat.localnet",
"pod_name":"fabric8-console-controller-98rqc",
"pod_id": "c76927af-f563-11e4-b32d-54ee7527188d",
"container_name": "fabric8-console-container",
"namespace_name": "default",
"namespace_id": "23437884-8e08-4d95-850b-e94378c9b2fd",
"labels": {
"component": "fabric8Console"
}
}
The Logging Agent (fluent-plugin-google-cloud) will then extract "namespace_id", "pod_id", "container_name" as "resource.labels.*" and "namespace_name", "pod_name" as "labels['compute.googleapis.com/resource_name']".
[How Kubernetes metadata is being handled as of the latest version today for GKE and Kubernetes in GCE]
The latest Kubernetes in GKE no longer relies on the kubernetes_metadata_filter
plugin. Instead, logs from Kubernetes are tagged with metadata. And Logging Agent extracts metadata from the tag. See details in this comment in the configmap. (Yet the logic to parse "kubernetes" fields from the log entry still exists in Logging Agent to support older Kubernetes versions. It's subject to deprecation though.)
[Why you are seeing the discrepancy up there]
Right now Logging Agent only extracts these metadata for GKE or Kubernetes on GCE, not Kubernetes on AWS EC2 or other platforms yet. This is because we do not have a resource type for Kubernetes on other platforms yet. (As you can see the resource type in AWS is at instance level (aws_ec2_instance
) instead of container level). That is something we are actively working on though.
Let us know if you have any follow-up questions or any specific needs. Maybe we could brainstorm some short-term solution before we launch the formal support.
@qingling128 thank you for clarifying the situation, it makes sense. OK, so I guess short term it is best to continue to use kubernetes_metadata_filter
on AWS in conjunction with this plugin until there is more complete support for container level logs here.
One thing that kubernetes_metadata_filter
provides but is missing here is pod labels (as of the latest stable version). Do you think this plugin will also start to query back to the kube API to get those? They are useful to filter based on logical sets of containers we've grouped via labels.
@c-knowles Pod labels in GKE will be ingested via other means and will eventually appear in the log entries on the viewer side without any explicit action on the user's part. I'll confirm the timelines and ping this issue.
@igorpeshansky, I see ok. So stackdriver will start to support enrichment directly, I suppose we could push labels for non-GKE clusters to the same API. Do you think this plugin will enable that for non-GKE clusters at some point or they won't be available? I'm happy to use the other plugin if we need to, the main issue would then be unification.
@c-knowles kubernetes_metadata_filter
is not officially part of the Kubernetes project, but people still rely on it, especially for the Elasticsearch-based deployments. I wanted to clarify that "deprecation" mentioned above is in the Stackdriver/GKE context
@crassirostris ok, thanks for clarifying. Yes, I understood kubernetes_metadata_filter
is not part of Kubernetes. We started using it to get the enrichment done on AWS clusters. One of our aims was to roughly align the AWS cluster logging with our GKE clusters, all piped into Stackdriver. I'm definitely happy to keep using it until there's more details about log enrichment or potential to ditch kubernetes_metadata_filter
on non-GKE clusters. e.g. pod labels, resource.labels.region
mentioned above.
Somehow I had hoped that #148 would help with this issue. But seems like even now after #148 is merged all the k8s logic is enabled only if Platform::GCE
is detected :(
It would be nice if there were an option to force the resource_type
...
@igorpeshansky What's the status on this? It would be great to have either the kubernetes metadata plugin integration or a direct integration of non-gce deployments via the kubernetes API
I was comparing the metadata enrichment results of this plugin on a Kubernetes cluster on AWS versus GKE since we're running Kubernetes on both. Please could someone can help clarify what is expected with the current plugin and best practice?
Currently on AWS we're using the
fluent-plugin-kubernetes_metadata_filter
to add metadata to logs coming from some k8s clusters. The logs go via fluentd and over to stackdriver premium. We are using both plugins together since there was a lack of useful metadata about the pod, namespace and container in the log indexing.We've since started to run some clusters on GKE as well but we'll still keep the AWS ones. So we're interested in at least some unification. On GKE we currently go with the built in logging which presumably uses this plugin.
I happened upon some comments from @crassirostris mentioning this plugin also performs metadata enrichment and that it's not greatly compatible with
kubernetes_metadata_filter
. Since then, it seems this recent PR has added some code which helps out with regard tokubernetes_metadata_filter
compatibility.From what I can gather based on what we are running the output of this plugin with regard to metadata enrichment is:
resource.labels.pod_id
resource.labels.namespace_id
resource.labels.container_name
resource.labels.cluster_name
resource.labels.instance_id
resource.labels.project_id
resource.labels.zone
region
resource.labels.aws_account
resource.labels.region
resource.labels.zone
resource.type
container
aws_ec2_instance
labels['compute.googleapis.com/resource_name']
labels['container.googleapis.com/namespace_name']
resource.labels.namespace_id
labels['container.googleapis.com/pod_name']
resource.labels.pod_id
labels['container.googleapis.com/stream']
labels['ec2.amazonaws.com/resource_name']
Adding
kubernetes_metadata_filter
gives a bunch of data insidejsonPayload.kubernetes
which covers most of the unavailable metadata from the AWS column but obviously under a different path.There is a slight potential that I've misconfigured it since most of the k8s logging documentation at the time of setup was related to how it works on GCP/GKE. It's based primarily off the config from k8s addons.