GoogleCloudPlatform / anthos-samples

Anthos code samples
https://cloud.google.com/anthos/docs
Apache License 2.0
93 stars 118 forks source link

Unrecoverable error sending samples to remote storage #579

Open nfsp3k opened 1 year ago

nfsp3k commented 1 year ago

Hello,

I registered a cluster outside of the GCP to the GKE and deployed the monitoring stack by following the guideline described in here.

The logging stack works appropriately but the monitoring part does not. I faced the following error msg from the pod stackdriver-prometheus-k8s-0 and no data is shown in the Metrics Explorer in the Google Cloud console.

$ kubectl -n kube-system logs stackdriver-prometheus-k8s-0 -c stackdriver-prometheus-sidecar
...
level=info ts=2023-02-27T13:51:21.072Z caller=manager.go:153 component="Prometheus reader" msg="Starting Prometheus reader..."
level=info ts=2023-02-27T13:51:21.196Z caller=manager.go:215 component="Prometheus reader" msg="reached first record after start offset" start_offset=0 skipped_records=0
level=warn ts=2023-02-27T13:51:23.885Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_node{node_name:worker02,location:global/memberships/on-prem-cluster-1,cluster_name:on-prem-cluster-1} timeSeries[94-199]: kubernetes.io/anthos/container_file_descriptors{id:/}; Unrecognized region or location.: k8s_container{location:global/memberships/on-prem-cluster-1,namespace_name:kube-system,container_name:stackdriver-log-aggregator,pod_name:stackdriver-log-aggregator-1,cluster_name:on-prem-cluster-1} timeSeries[0-31]: kubernetes.io/anthos/fluentd_output_status_buffer_queue_length{plugin_id:google_cloud,type:google_cloud,worker_id:2}; Unrecognized region or location.: k8s_container{container_name:stackdriver-log-forwarder,pod_name:stackdriver-log-forwarder-cn58l,namespace_name:kube-system,cluster_name:on-prem-cluster-1,location:global/memberships/on-prem-cluster-1} timeSeries[32-37]: kubernetes.io/anthos/process_start_time_seconds{}; Unrecognized region or location.: k8s_pod{cluster_name:on-prem-cluster-1,pod_name:gke-connect-agent-20230210-00-00-d758684d5-z24bv,namespace_name:gke-connect,location:global/memberships/on-prem-cluster-1} timeSeries[38-93]: kubernetes.io/anthos/gkeconnect_dialer_connections{}"
level=warn ts=2023-02-27T13:51:24.143Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_node{location:global/memberships/on-prem-cluster-1,cluster_name:on-prem-cluster-1,node_name:worker02} timeSeries[0-199]: kubernetes.io/anthos/container_file_descriptors{id:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podefff03d2_8c8e_4574_93ed_66af31833056.slice/cri-containerd-127ed814f320486cb121a35061313a4a447cfa99d840219a44415c54fc4ae0f2.scope,name:127ed814f320486cb121a35061313a4a447cfa99d840219a44415c54fc4ae0f2,pod:stackdriver-log-aggregator-1,container:stackdriver-log-aggregator,namespace:kube-system,image:gcr.io/stackdriver-agents/stackdriver-logging-agent:1.8.4}"
level=warn ts=2023-02-27T13:51:24.475Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_node{cluster_name:on-prem-cluster-1,node_name:worker02,location:global/memberships/on-prem-cluster-1} timeSeries[0-199]: kubernetes.io/anthos/container_fs_usage_bytes{device:overlay_0-243,id:/}"
level=warn ts=2023-02-27T13:51:24.806Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_node{location:global/memberships/on-prem-cluster-1,node_name:worker02,cluster_name:on-prem-cluster-1} timeSeries[0-199]: kubernetes.io/anthos/container_memory_usage_bytes{id:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podd1c615e2_9d3e_4de5_a45d_96589b9a3047.slice/cri-containerd-9b5e84f12f9ba31e3bb64f6371ef1a9e50a0ac35e4a98b6469e6202f0f9581ec.scope,pod:engine-image-ei-fc06c6fb-nq2nx,container:engine-image-ei-fc06c6fb,image:sha256:51eb4bbbe4cd40adb3ea920378e32c122922701c7126cd8ab255852e9de604a7,name:9b5e84f12f9ba31e3bb64f6371ef1a9e50a0ac35e4a98b6469e6202f0f9581ec,namespace:longhorn-system}"
level=warn ts=2023-02-27T13:51:25.081Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_container{cluster_name:on-prem-cluster-1,container_name:stackdriver-log-aggregator,namespace_name:kube-system,pod_name:stackdriver-log-aggregator-0,location:global/memberships/on-prem-cluster-1} timeSeries[46-77]: kubernetes.io/anthos/fluentd_output_status_buffer_queue_length{type:google_cloud,worker_id:6,plugin_id:google_cloud}; Unrecognized region or location.: k8s_node{cluster_name:on-prem-cluster-1,location:global/memberships/on-prem-cluster-1,node_name:worker02} timeSeries[0-45]: kubernetes.io/anthos/container_processes{id:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa719fc6_9f66_4488_8bb9_44124dcd40d0.slice,pod:calico-kube-controllers-8575b76f66-n5lpm,namespace:kube-system}; Unrecognized region or location.: k8s_container{cluster_name:on-prem-cluster-1,location:global/memberships/on-prem-cluster-1,pod_name:stackdriver-log-aggregator-1,namespace_name:kube-system,container_name:stackdriver-log-aggregator} timeSeries[78-137]: kubernetes.io/anthos/fluentd_output_status_buffer_queue_length{worker_id:5,plugin_id:google_cloud,type:google_cloud}"
level=warn ts=2023-02-27T13:51:25.276Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_node{cluster_name:on-prem-cluster-1,node_name:master,location:global/memberships/on-prem-cluster-1} timeSeries[15-199]: kubernetes.io/anthos/container_file_descriptors{id:/}; Unrecognized region or location.: k8s_container{namespace_name:kube-system,cluster_name:on-prem-cluster-1,container_name:stackdriver-log-aggregator,pod_name:stackdriver-log-aggregator-1,location:global/memberships/on-prem-cluster-1} timeSeries[0-3]: kubernetes.io/anthos/up{}; Unrecognized region or location.: k8s_node{cluster_name:on-prem-cluster-1,node_name:worker01,location:global/memberships/on-prem-cluster-1} timeSeries[4-14]: kubernetes.io/anthos/kubelet_volume_stats_available_bytes{namespace:kube-system,persistentvolumeclaim:stackdriver-log-aggregator-persistent-volume-claim-stackdriver-log-aggregator-0}"
level=warn ts=2023-02-27T13:51:25.467Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_node{location:global/memberships/on-prem-cluster-1,node_name:master,cluster_name:on-prem-cluster-1} timeSeries[0-140,177]: kubernetes.io/anthos/container_fs_usage_bytes{id:/,device:overlay_0-98}; Unrecognized region or location.: k8s_container{cluster_name:on-prem-cluster-1,container_name:stackdriver-log-aggregator,location:global/memberships/on-prem-cluster-1,namespace_name:kube-system,pod_name:stackdriver-log-aggregator-1} timeSeries[145-176]: kubernetes.io/anthos/fluentd_output_status_buffer_queue_length{type:google_cloud,worker_id:1,plugin_id:google_cloud}; Unrecognized region or location.: k8s_node{location:global/memberships/on-prem-cluster-1,node_name:worker02,cluster_name:on-prem-cluster-1} timeSeries[141-144]: kubernetes.io/anthos/up{}"
level=warn ts=2023-02-27T13:51:25.592Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_container{cluster_name:on-prem-cluster-1,pod_name:stackdriver-log-aggregator-0,location:global/memberships/on-prem-cluster-1,container_name:stackdriver-log-aggregator,namespace_name:kube-system} timeSeries[4-63]: kubernetes.io/anthos/fluentd_output_status_buffer_queue_length{worker_id:0,plugin_id:google_cloud,type:google_cloud}; Unrecognized region or location.: k8s_node{location:global/memberships/on-prem-cluster-1,cluster_name:on-prem-cluster-1,node_name:master} timeSeries[0-3]: kubernetes.io/anthos/up{}"
level=warn ts=2023-02-27T13:51:25.700Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_container{pod_name:stackdriver-log-aggregator-0,namespace_name:kube-system,cluster_name:on-prem-cluster-1,location:global/memberships/on-prem-cluster-1,container_name:stackdriver-log-aggregator} timeSeries[0-31]: kubernetes.io/anthos/up{}"
level=warn ts=2023-02-27T13:51:25.902Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized region or location.: k8s_container{namespace_name:kube-system,container_name:stackdriver-log-aggregator,cluster_name:on-prem-cluster-1,location:global/memberships/on-prem-cluster-1,pod_name:stackdriver-log-aggregator-1} timeSeries[4-35]: kubernetes.io/anthos/fluentd_output_status_buffer_queue_length{plugin_id:google_cloud,worker_id:4,type:google_cloud}; Unrecognized region or location.: k8s_container{pod_name:stackdriver-prometheus-k8s-0,location:global/memberships/on-prem-cluster-1,namespace_name:kube-system,container_name:prometheus-server,cluster_name:on-prem-cluster-1} timeSeries[36-199]: kubernetes.io/anthos/go_gc_duration_seconds{quantile:0}; Unrecognized region or location.: k8s_container{pod_name:stackdriver-log-aggregator-0,namespace_name:kube-system,location:global/memberships/on-prem-cluster-1,cluster_name:on-prem-cluster-1,container_name:stackdriver-log-aggregator} timeSeries[0-3]: kubernetes.io/anthos/up{}"
...

I found the location from the output of the following command.

$ gcloud container fleet memberships describe on-prem-cluster-1 | grep name
name: projects/flowing-radio-378705/locations/global/memberships/on-prem-cluster-1

The content of the file prometheus.yaml is as follows:

$ cat ./anthos-samples-0.14.0/attached-logging-monitoring/monitoring/prometheus.yaml
...
        - "--stackdriver.project-id=flowing-radio-378705"
        - "--stackdriver.kubernetes.location=global/memberships/on-prem-cluster-1"
        - "--stackdriver.generic.location=global/memberships/on-prem-cluster-1"
        - "--stackdriver.kubernetes.cluster-name=on-prem-cluster-1"
...

I think I lost a way. Can anyone help me to resolve this?

bourgeoisor commented 1 year ago

@minherz or @arbrown is that something you can help answer?

Shabirmean commented 1 year ago

@GoogleCloudPlatform/onyx-gke-observability

minherz commented 1 year ago

@nfsp3k I apologies for the unreasonably long delay in the response. If the issue is still relevant, can you please provide me with details about your cluster deployment location?

The documentation you reference describes the method that can be applied to clusters deployed at AKS and EKS and "added to your fleet using the previous generation of our attached clusters feature.". I'd like to make sure that the error you experience is not caused by incompatible hosting of the cluster or because your use the last version of the feature.