Open lucabelluccini opened 2 months ago
It seems like the scaling model is generally tied to the inputs (Beats) in use by a given integration, and not so much the integration itself.
For instance, the S3 input is mentioned as not being horizontally scalable when the queue_url
parameter is provided. Could we make an assumption that any input of type aws/s3
with a non-nullqueue_url
variable should result in a warning about the scalability model being displayed?
I'm sure there are other cases of this with "pull" based inputs like httpjson
, but it's not necessarily my area of expertise.
We manually specify in the manifest what is the scaling model of the integration. We expose the scaling model in the docs and, if possible, in the Fleet Integrations UI.
While I think this makes sense as a path forward, getting broad adoption across integrations in order to source data like this is usually a long-lived challenge. It would require each integration maintainer to provide this information, produce a new version of their integration including an updated format_version
to use the new package-spec
fields, and then for users to upgrade to the new version of the integration in order to be presented with the new data around scalability present on the integration.
This seems like a lot of churn to get something like this done, so I wonder if we should consider a less involved approach, such as adding specific detection in the Fleet codebase where we detect these scalability concerns based on a "hardcoded" mapping of metadata around specific input types or variables.
cc @nimarezainia I am going to assign this to you as it's in "Needs PM Prio" as well.
It seems like the scaling model is generally tied to the inputs (Beats) in use by a given integration, and not so much the integration itself.
I agree. It would be simplest if we could identify the scaling model based solely on the input (without other caveats or special cases).
I think configuration options we present to users, the agent handlebar config templates, and identifying the scaling model would be easier if we could treat the two aws-s3 input use case as independent inputs. Perhaps we add two alias names to the aws-s3 input in the spec like aws-s3-polling
and aws-s3-sqs
. This would then make it possible for a package developer to have separate inputs for each s3 use case. This addresses a few issues we have:
Does the package spec need to be modified at all? there are only a bunch of integrations/inputs that we would need to consider here, Mainly pub/sub ones we are faced with a conduit that feeds us the events and/or read directly via polling.
@lucabelluccini Could we not just document the scaling model for majority of these integrations?
I think separating aws-s3-polling
and aws-s3-sqs
is a warranted separate issue to deal with.
Hello @nimarezainia A first step might be documenting the scaling model. It would be already of great help. The problem is docs are often going stale and currently integration docs would need a dedicated section for such topic.
My manifest proposal was more towards taking a declarative approach from integration developers.
For declaring the scalability at input level or integration level, I am ok with both options. The important thing is to solve the problem of knowing the scalability model.
My suggestion of doing it at integration/data stream level was to "hide" the implementation detail (example: in the future an integration/data stream might change), but the final user rarely knows what input is used for each one.
If we're able to expose the scaling model based on the input used, than it is fine for me.
Discussed with @nimarezainia yesterday
(baseline) introduce a scalability documentation for all inputs
aws-cloudwatch
-> vertical scaling (local cursor stored in registry)aws-s3
(polling) -> vertical scaling (local cursor stored in registry, no way to sync concurrent consumers)aws-s3
(sqs) -> horizontal scaling (notification based, consumers ack the consumed events back to AWS)azure-eventhub
-> horizontal scaling (storage account + storage account container to store the consumers state/cursor)azure blob storage input
-> vertical scaling (workers)gcp-pubsub
-> vertical scaling (num_goroutines) + horizontal scaling (subscription)google cloud storage input
-> vertical scaling (workers)salesforce
-> vertical scaling (local cursor stored in registry)(baseline) expose to the user the documentation of the inputs used at Integration/Datastream level within the Fleet UI
(enhancement) introduce a non-mandatory scalability model attribute for inputs so that it can be programmatically used by Fleet
scalability_model
which can contain tags, such as vertical
, horizontal
or vertical (num_goroutines)
or horizontal (storage_account)
(enhancement) make use of the scalability model attribute to warn users the fact a policy containing an integration/data stream making use of a vertical-scalable input is deployed to N > 1 agents is likely going to waste resources
As this subject / topic is related to integrations, I'm putting in the loop also @daniela-elastic for the O11y-owned inputs.
I think we should try to lean into automation so that we these classifications for each integration don’t require much work to maintain. I would like to see attributes like horizontal/vertical scaling, stateful/stateless, and e2e acknowledgement support being tracked as metadata about the inputs we have (and kept near the input source). Then the reference docs for the inputs (e.g. Filebeat docs) and the integrations docs can derive from this metadata.
As an example, the simple tags that Vector adds to their input docs convey a lot of useful information.
gcp-pubsub -> vertical scaling (num_goroutines) + horizontal scaling (subscription)
gcp-pubsub has the same scaling characteristics as aws-s3 (sqs)
(horizontal). So whatever we list for s3 should be the same for pubsub.
As a starter let's modify the package spec to allow for this information to be set by the package owner. And for it to be included in the auto-generated integrations docs/integrations plugin.
Problem
Users are interested into knowing the scaling model of integrations / data streams. Examples:
Possible proposal (mitigation)
We manually specify in the manifest what is the scaling model of the integration. We expose the scaling model in the docs and, if possible, in the Fleet Integrations UI.
Possible proposal (long term)
Each input should have a metadata/spec where it claims its scaling model.
The integration package manifest checks the inputs in order to automatically generate the scaling model of the integration / data stream.
Example:
FYI @lalit-satapathy @jsoriano @zmoog