elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
108 stars 4.93k forks source link

[Metricbeat] No distinction for `gcp.pubsub` #30912

Open endorama opened 2 years ago

endorama commented 2 years ago

Stemming from https://github.com/elastic/beats/issues/29815 and https://github.com/elastic/beats/issues/30911

With gcp.pubsub, we collect data for subscriptions and topics, under the pubsub metricset.name, and is not possible to distinguish them if not by inspecting the data.

This would require a sort of "sub dataset" information (es pubsub/subscriptions and pubsub/topics). I'm not sure if package-spec allows for this granularity.

kaiyan-sheng commented 2 years ago

How about metrics from pubsub/snapshots?

endorama commented 2 years ago

I had a look today at the snapshot feature. They are used to seek a subscription to allow replaying messages (acknowledged messages are inaccessible to subscribers of a given subscription). Creating a snapshot allow to seek the subscription messages ack status from acked to unacked so they can be pulled again. (Further readings: https://cloud.google.com/pubsub/docs/replay-overview https://cloud.google.com/pubsub/docs/replay-message)

I don't think the messages has any clue it's being replayed (at least from the JSON data included in the message).

@kaiyan-sheng are you aware of any indicators that a message comes from a subscription after a snapshot seek?

gpop63 commented 2 years ago

What if 2 data streams would be created in the gcp integration: pubsub_topic having event.dataset gcp.pubsub_topic and pubsub_subscription having event.dataset gcp.pubsub_subscription?

kaiyan-sheng commented 2 years ago

I was wondering if we will have a snapshot data stream because I see in Beats https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/gcp/pubsub/manifest.yml#L9-L15 we are collecting metrics from not only subscriptions, topics but also snapshots.

endorama commented 2 years ago

I see 2 ways to distinguish them:

  1. is to have multiple data streams;
  2. is to use the ingest pipeline to add some dedicated metadata.

@kaiyan-sheng what do you think?

A note: as we will not implement this feature in beats, this will be migrated to integrations later on. Note 2: I think I favour (1), we can aggregate them in the same integration for easier usage from customers while having multiple data streams to account for differences.

kaiyan-sheng commented 2 years ago

@endorama

  1. is to have multiple data streams;

So for this implementation, we are only talking about changing it in integrations right? Will that become a breaking change? Unless we keep the existing pubsub data stream and add 3 new ones pubsub_snapshot, pubsub_subscription and pubsub_topic.

botelastic[bot] commented 3 weeks ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!