Open colinsurprenant opened 7 years ago
I'd like to postpone any changes into the api around this (strict or lenient) to 6.0. or do you expect deduping to belong in 5.x even though it is not happening today? does anyone ever care there are two of the same tag in the list? usually people just check existence of at least one.
My 2 cents:
We should not de-duplicate tags, they are not a Set.
For example, if a user has add_tag
options in all of the filters, by allowing repeated tags the user can see which "path" the event took through the config. So if the config filters are 'grok', 'date', 'date', 'mutate', 'mutate' and the user add_tag with the same strings - we should get
'grok','date','date','mutate','mutate' instead of 'grok','date','mutate'
On the flip side is there any utility in have 2 or more _grok_parse_failure tags (i.e. tags added in plugin code)?
Also when a user does remove_tag
and there is dups of that tag - should the code only remove the last of the dups?
The docs for add_tag
are silent about duplicating tags.
How about we decide (and document) that if a user specifies add_tag
multiple times in the config with the same value then we honour that but a plugin dev will strive to not add duplicates (via event.tag
) unless there is a need to do so?
I'm in favor of de-duping.
I think the find the path functionality is better served by the user specifying a specific tag per grok filter, otherwise there's no telling which one it came from.
I think that de-clutters things and is more in-line with what a tag is in most applications, that is to say a member of a set.
I just checked with Jordan as to whether there was any historical reason for duplicates. There are none - in fact he said tags were intended to behave as a Set.
I am OK with removing duplicates as there is no historical precedent for them.
As for:
Is it to simply define a new @tags field with a clear API
I have no input on this.
Does that mean that an operation like event.set("tags", ["hey", "hey"])
should silently result in event.get("tags") == "hey"
? How far should we protect the event.tag("hey")
when someone uses event.set ?
There is also a subtle problem, not currently coded for, if an input or codec builds a Hash before creating the event from it and that hash has a tags field with a Hash value. Boom if a filter tries to add a tag.
The tagging concept and the
tags
field usage would benefit some thoughts and a properly defined API for the future (6.0?).Some problems have been identified in #6142 and are being solved in #6177.
There is one inconsistency issue I'd like to discuss now to see if this is something we should fix in a minor/bugfix release.
Event#tag
/Event.tag()
methods deduplicates thetags
field list, meaning that it will only insert the given tag in thetags
field list only if it is not already in that list.but
add_tag
directive does not do deduplication. The same tag can be inserted in thetags
field multiple times.Should we also do deduplication in the config
add_tag
? if so, are there chances this break current configurations?Otherwise, the larger problem here is that the
tags
field has never been properly considered a "reserved" field with a "defined" API, like@timestamp
for example. so it is totally possible for anyone to place arbitrary values in thetags
field but on the other hand, a lot of plugins rely on the existence and "validity" of thetags
field.I'd like open up the discussion on the best way to address this moving forward. Is it to simply define a new
@tags
field with a clear API? this would require changes is configurations? code sweep across plugins would also be necessary.