Open criamico opened 11 months ago
Pinging @elastic/fleet (Team:Fleet)
Could we do something like this?
agent.version_2
(a better name would be great) with the proper version
mapping typeagent.version
Or, we could update the mapping type on the existing field and run the same kind of migration script to update any existing documents in the .fleet-agents
index during a stack upgrade.
Add a new mapping to the agent index e.g. agent.version_2 (a better name would be great) with the proper version mapping type
This would be a good solution as it's not a breaking change. Only downside is that we already have some duplication: agent.version
and local_metadata.elastic.agent.version` hold the same info. We could just copy the content of any of those two fields to the new one and keep only the new one as searchable. When https://github.com/elastic/kibana/issues/171425 will be completed we'll be able to control the searchable mappings in an easier way.
Also, we could just call the new field version
so when the users search for it don't have to look on nested objects.
Or, we could update the mapping type on the existing field and run the same kind of migration script to update any existing documents in the .fleet-agents index during a stack upgrade.
Not sure if this option would make things slower as was initially mentioned.
Hi @MakoWish, https://github.com/elastic/kibana/issues/167139 listed two different bugs. The first one (400 when searching for agent.version: "8.10.0"
) got fixed with https://github.com/elastic/kibana/pull/168329 and the fix is available since 8.10.4.
The second bug (related to incorrect versions ordering) will be fixed with this ticket, but unfortunately we still haven't got to it.
@jen-huang do you think this could be added to the one of the next sprints?
Any plain-text searching fails; not just an agent.version
search. We used to be able to search for "Computer1" to find a specific device, but that has now been failing for several months. We are on 8.11.4, and I would have expected that to be fixed now. I have a support ticket open for this as well.
I was hoping I'd be able to test the impact of changing the mapping type for this property on the fly, but it doesn't look like that's possible, e.g.
PUT /.fleet-agents/_mapping
{
"properties": {
"agent": {
"version": {
"type": "version"
}
}
}
}
# Response
{
"error": {
"root_cause": [
{
"type": "illegal_state_exception",
"reason": """Cannot update mappings in [.fleet-agents-7]: system indices can only use mappings from their descriptors, but the mappings in the request [{"properties":{"agent":{"version":{"type":"version"}}}}] did not match those in the descriptor(s)"""
}
],
"type": "illegal_state_exception",
"reason": """Cannot update mappings in [.fleet-agents-7]: system indices can only use mappings from their descriptors, but the mappings in the request [{"properties":{"agent":{"version":{"type":"version"}}}}] did not match those in the descriptor(s)"""
},
"status": 500
}
So, to test this we'd need to boot up ES + Kibana both from source which is a bit time consuming. The agent.version
property is defined here in ES:
We need to test what happens with integration dashboards that rely on this property when its type is changed, as well as verify that KQL searches around this property behave as expected wrt to version comparisons.
I don't have the bandwidth to continue testing this right now, so I'll bring this along into our next sprint and have on the engineers on the UI & Integrations team take another look.
agent.version
mapped to keyword
typeThis affects searching and sorting as the version values are compared alphabetically. For instance, given multiple agents on versions 8.8.0, 8.9.0, 8.11.0 and 8.14.0:
agent.version < 8.14.0
will only return the agent on version 8.11.0agent.version > 8.14.0
will return the agents on version 8.8.0 and 8.9.0agent.version > 8.8.0
will only return the agent on version 8.9.0The behaviour is identical for local_metadata.elastic.agent.version
.
agent.version
mapping type to keyword
This seems to be a fairly involved process. As a reminder, mapping field types for existing fields can't be changed in place, so assessing the impact is not as straightforward as updating the mapping definition. One possible solution could be reindexing the data into a new index with the updated mapping, which at high level would likely involve bumping the current index version, introduce the new mapping with the changes and then reindex the data. This is a risky procedure that should be assessed with more details. Note that @jfreden is working on a similar change and has kindly offered to answer questions (Slack thread). Edit: this work actually populates a new field based on an existing field, so it is a different scenario.
FWIW, I played a bit in the dev tools with reindexing agent data into a new index with mapping identical to .fleet-agents
except I changed agent.version
type to version
. The reindexing happened with no issue and data ordering is correct.
version
field with version
type and backfill itSuggested in https://github.com/elastic/kibana/issues/168604#issuecomment-1831986672. I imagine this would still require reindexing since we would need to populate it for existing documents, so maybe similar effort as the previous approach. While it would have the advantage of safety, it might lead to confusing behaviour, where version
orders differently than agent.version
and local_metadata.elastic.agent.version
.
version
field with version
typeSuggested in https://github.com/elastic/kibana/issues/167139#issuecomment-1757403193. Probably easier to implement and low risk, but performance could be an issue and it would suffer from the same potentially confusing behaviour as the second approach. Edit: runtime fields don't support the version
type, so this would require converting the keyword
value into a type with the desired sorting, e.g. a long
.
@jillguyonnet Not sure I understood the conditions below why does the same condition return different versions?
agent.version < 8.14.0 will only return the agent on version 8.11.0 agent.version < 8.14.0 will return the agents on version 8.8.0 and 8.9.0
Thanks @jlind23 - that was a typo, I corrected it. 🙂 It should read:
agent.version > 8.14.0
will return the agents on version 8.8.0 and 8.9.0
I've also tested adding a sub-field to the existing agent.version
and local_metadata.elastic.agent.version
fields. While this is far easier and non-breaking, I'm not sure we'd want a new agent.version.version
field, which would order differently than agent.version
and local_metadata.elastic.agent.version
.
Starting state:
Searching by agent.version > 8.8.1
: wrong filtering because agent.version
has type keyword
:
Searching by agent.version.version > 8.8.1
: correct filtering because agent.version.version
has type version
:
Approach 1 seems to be the cleanest one but I understand that it is also the most complex. In that sense, approach 2 is probably the best one even if it implies adding a new field. We could document that there is a change in behaviour and I hope this would avoid issues and SDH being created.
@nimarezainia @kpollich any thoughts on this?
Can you but all my elastics kibban data from my Google cloud service back how it was
Dan Hibbitts
On Fri, May 24, 2024, 2:38 AM Jill Guyonnet @.***> wrote:
I've also tested adding a sub-field to the existing agent.version and local_metadata.elastic.agent.version fields. While this is far easier and non-breaking, I'm not sure we'd want a new agent.version.version field, which would order differently than agent.version and local_metadata.elastic.agent.version.
Starting state: Screenshot.2024-05-24.at.11.32.17.png (view on web) https://github.com/elastic/kibana/assets/23701614/0da5512e-ade7-4fe9-a64c-08eee52ea022
Searching by agent.version > 8.8.1: wrong filtering because agent.version has type keyword: Screenshot.2024-05-24.at.11.32.37.png (view on web) https://github.com/elastic/kibana/assets/23701614/47cd9048-55ee-4e55-907b-0ffcb2b31ae2
Searching by agent.version.version > 8.8.1: correct filtering because agent.version.version has type version: Screenshot.2024-05-24.at.11.32.51.png (view on web) https://github.com/elastic/kibana/assets/23701614/0ea32cde-d6e6-4ffa-ab30-fd1fe13d9c3b
— Reply to this email directly, view it on GitHub https://github.com/elastic/kibana/issues/168604#issuecomment-2129084014, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWSXWJF6STPGVWT67AMM2XLZD4DBXAVCNFSM6AAAAAA54BPXROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRZGA4DIMBRGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Approach number 1 would be my preference here as I think the searching UX will be confusing regardless of how well we document this. Having the search "work" for both agent.version
and agent.version.version
with different results will not be immediately obvious within the product itself. If the risks/complexity with option 1 are mitigable I think it's worth pursuing as it feels like the best root cause fix here.
Approach number 1 would be my preference here as I think the searching UX will be confusing regardless of how well we document this. Having the search "work" for both agent.version and agent.version.version with different results will not be immediately obvious within the product itself. If the risks/complexity with option 1 are mitigable I think it's worth pursuing as it feels like the best root cause fix here.
I agree 👍 Ideally, I would like to get a better idea of the kind of Elasticsearch work required in order to migrate to a new index with the correct mapping, which I haven't succeeded in doing in my local setup so far, this to better estimate the effort and properly assess the impact on Fleet and integration assets. FWIW, I think there would be required ES work for backfilling anyway (see e.g. https://github.com/elastic/elasticsearch/pull/108318).
An update to approach 3: I realised that runtime fields don't support the version
type (I tested this locally). The suggested workaround involved a long
-typed runtime field that converts the version keyword to a sortable number.
Snapshot summary of the approaches discussed so far:
keyword
to version
for agent.version
and local_metadata.elastic.agent.version
version
field with version
type
agent.version
and local_metadata.elastic.agent.version
fields, or at least hide them in the Kibana search box.long
, performance concerns, same confusing UX concerns as 2.version
type to the existing agent.version
and local_metadata.elastic.agent.version
fields
agent.version.version
(which orders differently than agent.version
) is terrible confusing UX@jillguyonnet @kpollich now that you worked on the investigation, what should be the next step then?
Adding here the latest information I got from discussing potential Elasticsearch work with @jfreden who is working on a framework for facilitating system index migrations in https://github.com/elastic/elasticsearch/pull/109093.
https://github.com/elastic/elasticsearch/pull/109106 provides an example of how this framework could be used once merged. The guidelines are:
@jillguyonnet @kpollich now that you worked on the investigation, what should be the next step then?
Ideally, I would have liked to properly test the impact of changing the type of the existing fields. With @jfreden's change mentioned above, this could definitely be possible but not guaranteed to be quick.
Workaround options involving a new field actually sound worse to me than doing nothing, since they wouldn't fix the bug and introduce confusing UX.
I would be in favour of considering the field type change approach, but perhaps delay the next step until https://github.com/elastic/elasticsearch/pull/109093 is merged. It might make sense to break down this followup work as follows:
+1 on delaying follow-up work here until https://github.com/elastic/elasticsearch/pull/109093 is done.
It might make sense to break down this followup work as follows:
- A second investigation focusing on performing the type change and reindex locally and test the impact on Fleet and integration assets
- Assuming the investigation is satisfactory, do the actual system index migration work
- Perform a QA check in Fleet UI and integration assets and verify that the sorting bug is resolved
This makes sense to me as our path forward. I'm going to take this off your plate, @jillguyonnet and move this into a future sprint while the ES work is still pending. Thanks for your thorough investigation here!
Updated the description of the issue since the investigation is done. This feels like the best way to maintain visibility on the investigation and path forward, rather than creating another issue for the implementation work 🙂
Just to follow up from the ES side, the https://github.com/elastic/elasticsearch/pull/109093 PR was closed and we won't work on it now. See full context here. Happy to discuss more if you have any concerns or questions. Thanks!
Thanks for the context, @jfreden.
SystemIndexMigrationExecutor is very similar, but is handled through an API and a kibana UI when upgrading between major version. As opposed to this approach, the intention of the new framework proposed in this PR, is to keep it completely automated.
Is this something we could use to change this mapping type between major versions, or would that still not be supported? Wondering if we could add this to our list of 9.x breaking changes and punt on it until the next major.
I think that you could do that. Both the security and watcher indices has done that in the past (6->7, 7->8). Not sure what the upgrade from 8->9 will look like though and I don't have experience with that.
Blocked by https://github.com/elastic/elasticsearch/pull/109093
In Kibana versions >= 8.10.x it was found a bug in the response of range queries when the mappings are of
keyword
typeOriginal ticket: https://github.com/elastic/kibana/issues/167139
The reason seems to be that version numbers will work correctly as
keyword
until the numbers have multiple digits in them.Suggested solutions
Tasklist
Determine how updating this mapping would impact dashboards, visualizations, etc when
agent.version
may be mapped as two different types in query results. This would be a breaking change and we should asses the possible impacts across the whole UI before proceeding with a fix.