Open joegallo opened 1 year ago
Pinging @elastic/es-data-management (Team:Data Management)
Based on discussion today, the next step here is to benchmark the non-breaking change version of this where the params are deeply copied and not directly mutable. (Or, perhaps, copy on write?)
When we get around to fixing this bug, we need to remember that parameters can be deeply nested. One of the parameters could be a list of maps of lists of maps, etc.
In an ideal world, none of these ever would have been mutable to begin with. Since that ship has sailed, however, I imagine the simplest version to implement is to always build a new parameters object-tree on each invocation, which isn't necessarily super performant.
And of course a deeply nested copy on write object tree is easy enough to speculate about on paper, but devilishly hard to implement...
https://github.com/elastic/elasticsearch/issues/53679 seems to be related to this, and I agree with https://github.com/elastic/elasticsearch/issues/53679#issuecomment-773550143 about the most ideal solution -- the question in my mind is how we get there from here (what with it being a breaking change to some people, at least).
Consider a pipeline like the following:
And then repeatedly invoke the
_simulate
API against that pipeline:Each invocation will result in growing a new '10' because the script is leaking a reference to the underlying mutable 'list' parameter:
Fixing this in the calling code is relatively easy, the pipeline should be rewritten as:
Now the
list
parameter is defensively copied and so we're no longer banging on an unintentionally static object.Note: the params object itself is unmodifiable, you can't invent/inject a new parameter at runtime via script, however it's only shallowly immutable -- the tree of maps and lists that it represents is made of mutable maps and lists.
IMHO we should update ScriptProcessor to either return a shared deeply immutable param tree, or we should duplicate the params tree on each invocation in which case it's still mutable but no longer shared.
The current behavior is harmless enough when ingest pipelines merely leak the mutable reference, but it can cause rare and hard to track down race-condition bugs when the pipelines actually do mutate the shared mutable reference.