[Meta] elastic-agent-shipper journey to GA

leehinman commented 1 year ago

Fill in checklists below with issues

Checklist to achieve `experimental` status

[ ] end to end acknowledgment / at least once delivery guarantee with memory queue (elastic/beats#35266)
[ ] #309
[ ] verify events ingested through shipper are same as without going through shipper (#290) relates to (#257)
[ ] elastic-agent can start filebeat as shipper (elastic/elastic-agent#2521)
[ ] shipper monitoring (elastic/elastic-agent#2580) and (elastic/beats#35267)
[ ] performance tests for filebeat reading from file and sending to shipper using memory queue and writing to elasticsearch have been performed (related #266)

Checklist to achieve `beta` status

[ ] support global processors
[ ] Elasticsearch V2 output
[ ] diskqueue is beta (see #118)
[ ] performance use cases are finalized
[ ] performance is at least close (~90%) of Beats under agent with own output
[ ] tests exist for memory and IO usage, and comparisons to Beats under agent with own output
[ ] can be selected as output in fleet UI
[ ] startup, shutdown, input, output & performance issues can all be debugged with only the data from the elastic-agent diagnostics command.
[ ] handle policy updates & queued events

Checklist to achieve `ga` status

[ ] elastic-agent-shipper is default output in fleet UI
[ ] performance is as good as current Beats under agent for all performance use cases
[ ] disk queue is ga (see #118)
[ ] output automatic tuning finalized

[ ] disk queue is experimental (see #118)
[x] #10
[x] #185
[ ] support for selectors for index / data_streams
[x] performance testing framework exists
[ ] performance tests for filebeat reading from file and sending to shipper using memory queue and writing to elasticsearch have been performed
[x] #257
[x] #244
[x] #253
[x] #255
[x] #256

Checklist to achieve `beta` status

[ ] disk queue is beta (see #118)
[ ] #116
[x] #129
[ ] beat processors are available
[ ] at least once delivery guarantee
[ ] handle policy updates & queued events
[ ] start output automatic tuning
[ ] global metrics
[ ] input metrics
[ ] processor metrics
[ ] output metrics
[ ] startup, shutdown, input, output & performance issues can all be debugged with only the data from the elastic-agent diagnostics command.
[ ] performance use cases are finalized
[ ] performance is at least close (~90%) of Beats under agent with own output
[ ] tests exist for memory and IO usage, and comparisons to Beats under agent with own output
[ ] https://github.com/elastic/elastic-agent-shipper/issues/213
[ ] can be selected as output in fleet UI
[ ] tests to ensure that payloads are equivalent to those from beats output
[ ] Ensure that the story around retries is solid and consistent

Checklist to achieve `ga` status

[ ] disk queue is ga (see #118)
[ ] output automatic tuning finalized
[ ] support for global processors
[ ] elastic-agent-shipper is default output in fleet UI
[ ] performance is as good as current Beats under agent for all performance use cases

cmacknz commented 1 year ago

In either the experimental or beta criteria we need an item to track that the shipper is debuggable using only the information collected by the agent diagnostics command.

Checklist to achieve beta status

[ ] performance testing framework exists

[ ] performance use cases are finalized

Maybe this is implicit in the two items above, but I think we really want to know how the performance of the agent with the shipper compares to the performance of the agent without shipper before we can recommend anyone use it as a beta.

I think I would rather see "performance is as good as current Beats under agent for all performance use cases" as a Beta requirement to set expectations properly for ourselves, we don't want to pursue this only at the end. If there is some unexpected challenge here we can defer it from the Beta criteria later, but ideally we can make the shipper a performance improvement.

Checklist to achieve ga status

[ ] support for global processors

I don't think we need global processors to be GA, because this is a completely new feature. This could be done at any time.

[ ] Output automatic tuning finalized

What does "finalized" mean here? We may want to be cautious about coupling the shipper GA criteria to a GA-able implementation of automatic output tuning. Ideally we can include this though, it is likely necessary to avoid annoying configuration migrations (for existing workers and bulk_max_size configurations).

leehinman commented 1 year ago

In either the experimental or beta criteria we need an item to track that the shipper is debuggable using only the information collected by the agent diagnostics command.

Added.

leehinman commented 1 year ago

Maybe this is implicit in the two items above, but I think we really want to know how the performance of the agent with the shipper compares to the performance of the agent without shipper before we can recommend anyone use it as a beta.

Moved the "performance as good as curent Beats under agent" up to beta

leehinman commented 1 year ago

I don't think we need global processors to be GA, because this is a completely new feature. This could be done at any time.

I'm in favor of moving this post GA. The reason it on the list is because we don't seem to have a list of features for MVP, so I was going off the assumption that all of the ones listed in the design doc would be needed for GA.

leehinman commented 1 year ago

[ ] Output automatic tuning finalized

What does "finalized" mean here? We may want to be cautious about coupling the shipper GA criteria to a GA-able implementation of automatic output tuning. Ideally we can include this though, it is likely necessary to avoid annoying configuration migrations (for existing workers and bulk_max_size configurations).

I was thinking "finalized" would be the user facing portion, so if we need to change the configuration parameters we can up to this point, but after this we have to worry about configuration migration. Maybe it would be better to rename this to something like "finalize configuration options for GA"?

cmacknz commented 1 year ago

Maybe it would be better to rename this to something like "finalize configuration options for GA"?

Agreed, let's make that change to clarify this.

faec commented 1 year ago

support for selectors for index / data_streams

This is listed in beta but to me it seems like we might want it for experimental, or at least some partial solution -- today the shipper can only target a single hardcoded Elasticsearch index. We could easily make that single index configurable, but it would still be a single fixed index. Targeting multiple indices with a single shipper would likely require updates to the support library (I've created an issue for the main technical dependency here).

I'm not sure how we expect people to use the experimental releases, but to me it seems like sending all inputs from all sources to a single fixed index would rule out an awful lot of use cases, even for testing.

Overall the question of index / data stream selection could use a lot more clarity... I gather that at some point the output data streams will all be managed through agent, but I'm not sure we have a definite plan how that will happen. Maybe "Event index / datastream can be derived from the agent policy" should be its own item on the checklist, since getting that information from upstream is a separate process than just supporting selectors internally?

leehinman commented 1 year ago

Overall the question of index / data stream selection could use a lot more clarity... I gather that at some point the output data streams will all be managed through agent, but I'm not sure we have a definite plan how that will happen. Maybe "Event index / datastream can be derived from the agent policy" should be its own item on the checklist, since getting that information from upstream is a separate process than just supporting selectors internally?

The gRPC Event has the datastream field. https://github.com/elastic/elastic-agent-shipper-client/blob/a7eedbe6bd6c711eac7ee1b2f7d7cf6ea03155be/api/messages/publish.proto#L56-L63 Is that sufficient for index / data stream selection?

leehinman commented 1 year ago

This is listed in beta but to me it seems like we might want it for experimental, or at least some partial solution -- today the shipper can only target a single hardcoded Elasticsearch index. We could easily make that single index configurable, but it would still be a single fixed index. Targeting multiple indices with a single shipper would likely require updates to the support library (I've created an issue for the main technical dependency here).

Moved it to experimental. From the comments on #202 it looks like targeting multiple indexes should work.

elastic / elastic-agent-shipper