elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.13k stars 4.91k forks source link

[Metricbeat] No distinction for `gcp.pubsub` #30912

Open endorama opened 2 years ago

endorama commented 2 years ago

Stemming from https://github.com/elastic/beats/issues/29815 and https://github.com/elastic/beats/issues/30911

With gcp.pubsub, we collect data for subscriptions and topics, under the pubsub metricset.name, and is not possible to distinguish them if not by inspecting the data.

This would require a sort of "sub dataset" information (es pubsub/subscriptions and pubsub/topics). I'm not sure if package-spec allows for this granularity.

kaiyan-sheng commented 2 years ago

How about metrics from pubsub/snapshots?

endorama commented 2 years ago

I had a look today at the snapshot feature. They are used to seek a subscription to allow replaying messages (acknowledged messages are inaccessible to subscribers of a given subscription). Creating a snapshot allow to seek the subscription messages ack status from acked to unacked so they can be pulled again. (Further readings: https://cloud.google.com/pubsub/docs/replay-overview https://cloud.google.com/pubsub/docs/replay-message)

I don't think the messages has any clue it's being replayed (at least from the JSON data included in the message).

@kaiyan-sheng are you aware of any indicators that a message comes from a subscription after a snapshot seek?

gpop63 commented 2 years ago

What if 2 data streams would be created in the gcp integration: pubsub_topic having event.dataset gcp.pubsub_topic and pubsub_subscription having event.dataset gcp.pubsub_subscription?

kaiyan-sheng commented 2 years ago

I was wondering if we will have a snapshot data stream because I see in Beats https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/gcp/pubsub/manifest.yml#L9-L15 we are collecting metrics from not only subscriptions, topics but also snapshots.

endorama commented 1 year ago

I see 2 ways to distinguish them:

  1. is to have multiple data streams;
  2. is to use the ingest pipeline to add some dedicated metadata.

@kaiyan-sheng what do you think?

A note: as we will not implement this feature in beats, this will be migrated to integrations later on. Note 2: I think I favour (1), we can aggregate them in the same integration for easier usage from customers while having multiple data streams to account for differences.

kaiyan-sheng commented 1 year ago

@endorama

  1. is to have multiple data streams;

So for this implementation, we are only talking about changing it in integrations right? Will that become a breaking change? Unless we keep the existing pubsub data stream and add 3 new ones pubsub_snapshot, pubsub_subscription and pubsub_topic.