elastic / elastic-integration-corpus-generator-tool

Command line tool used for generating events corpus dynamically given a specific integration
Other
21 stars 12 forks source link

Support generating counter values #124

Open aliabbas-elastic opened 8 months ago

aliabbas-elastic commented 8 months ago

As of now, the corpus-generator-tool produces different gauge values with the use of ranges, enum and fuzziness as per the config documentation.

However there can be a config defining a particular field as counter so it always produces incremental values. There are many such packages which are TSDB migrated which are using these counter fields. It would be easier to implement the rally benchmarking templates as well.

Counters can be reset:

  1. having the possibility to enable/disable just totally random occurrences of the reset.
  2. having the possibility to define the probability that the counter is reset every time a new value is generated
  3. having the possibility to specify the number of events after which the counter must be reset

Proposal for a counter_reset configuration.

1. 
counter_reset: 
  strategy: random
2.
counter_reset: 
  strategy: probability
  probability: 0-100 # %
3.
counter_reset: 
  strategy: events_amount
  events_amount: N

cc:- @aspacca

ruflin commented 8 months ago

++ on supporting counters. @aspacca How much effort would it be to add this?

aspacca commented 8 months ago

@ruflin code wise the effort is minimal, I have to come up with some ideas on how to configure the feature in the config file :)

ruflin commented 7 months ago

@aspacca Can you share a bit more on your thoughts and put them down here for further discussion?

aspacca commented 7 months ago

@ruflin

on the code side is pretty straightforward the current behavior is the following: assuming the previous generated value is 20, with a fuzziness of 0.6, the next value X generated applying fuzziness is in the range 20 * (1 - 0.6) < X < 20 * (1 + 0.6). range.min and range.max are applied as lowest and highest bounds of the range, if present.

this work wells for gauge

for counter, X must be generated in 20 < X < 20 * (1 + 0.6).

on the config side, I'm keen to exclude the possibility to have a field with both range and is_counter (or something like that). the reason is because if we have a range, we'll hit range.max after a number big enough of events generated.

I not quite convinced yet, as well, that the config property should be a flag like is_counter: true (meaning that is_counter: false is the same as not defining the property at all). but I cannot figure a more generic property where more_generic_property: counter

any suggestion?

ruflin commented 7 months ago

on the config side, I'm keen to exclude the possibility to have a field with both range and is_counter

Agree, I don't think mixing counters and range makes sense. What about just using counter: true and it would disable / error if other settings are set?

One follow up challenge I see with counters is, that there is a potential dependency. What I mean by that is that for each agent.id there is a separate counter but I wonder if we overcomplicate things with this and can just us a global counter?

aspacca commented 7 months ago

Agree, I don't think mixing counters and range makes sense. What about just using counter: true and it would disable / error if other settings are set?

agree on that

One follow up challenge I see with counters is, that there is a potential dependency. What I mean by that is that for each agent.id there is a separate counter but I wonder if we overcomplicate things with this and can just us a global counter?

let's discuss on a separated issue the current limit we have using cardinality to link together different fields. for this specific case, for example, cardinality is not a viable solution

cc @tommyers-elastic

aspacca commented 7 months ago

@ruflin , @aliabbas-elastic, please check the update in the issue description about the counter_reset config.

If we are on the same page that it's worth to be introduced let's prioritize between the three cases.

cc other stakeholders: @gizas , @tommyers-elastic

ruflin commented 7 months ago

As a reset of a counter with a shipper normally means the shipper was restart or a new host/service came online, I like certain randomness to it which 1 and 2 provides. I would argue, a counter reset is rare and not all counters in a template should be reset at the same time. This again speaks for the randomness. My concern around 1 is, that resets would likely happen too often. With 2, I have the flexibility but the configuration options suggests to configure something like 1% which could mean every 100 events a reset happens which I think is too often. The config should encourage larger values like 10k, 10000k. My preference is on 2.

I like the idea of counter reset but it is not clear to me if it is a top priority. Is there currently an issue we have if no reset happens?

aspacca commented 7 months ago

My concern around 1 is, that resets would likely happen too often. With 2, I have the flexibility but the configuration options suggests to configure something like 1% which could mean every 100 events a reset happens which I think is too often. The config should encourage larger values like 10k, 10000k. My preference is on 2.

looking better the only difference between 1 and 2 is requiring to define a probability, vs having an hardcoded one. So yes, let's start with 2, and if we collect feedback that requiring to define a probability hinder UX let's also add 1.

As for the probability range: it's just a matter of the magnitude used to express it See how fuzziness work:

value must be between 0.0 and 1.0, where 0 is 0% and 1 is 100%. When not specified there is no constraint on the generated values, boundaries will be defined by the underlying field type

within the 0-1 range, 0.00001 is equivalent to 10000k. Maybe it's not "encouraging large values", and we should more guide the devs with the docs, but I'm afraid that if we set an explicitly "large range", like 1-10000, we encourage that exact "largeness".

I like the idea of counter reset but it is not clear to me if it is a top priority. Is there currently an issue we have if no reset happens?

I think there is no issue on its own if no reset happens, but it might be relevant to the use case. I expect Kibana developers want to see a counter reset.

ruflin commented 7 months ago

within the 0-1 range, 0.00001 is equivalent to 10000k.

Ok, lets start with this. We can still improve it later on if needed or solve it with docs.

Kibana developers want to see a counter reset.

Lets wait until this request comes along ;-)