Importing saved objects creates duplicate index pattern when the index pattern with the same name already exists in Kibana

bhavyarm commented 6 years ago

Kibana version: 5.6.0 latest snapshot

Elasticsearch version: 5.6.0 latest snapshot

Server OS version: darwin_X86_64

Browser version: chrome latest

Browser OS version: OS X

Original install method (e.g. download page, yum, from source, etc.): snapshot build

Description of the problem including expected versus actual behavior: I manually created metricbeat index pattern, checked the data on discover and then loaded beats dashboards. This created a duplicate metricbeat-* indexpattern in Kibana.

Steps to reproduce:

Start ES/Kibana/Metricbeat
Create metricbeat indexpattern manually
Load dashboards using ./script/import_dashboards - Kibana creates another metricbeat index pattern

bhavyarm commented 6 years ago

Trying to get this going in 5.4.0 to see whats the behaviour

bhavyarm commented 6 years ago

In 5.4.0: (you have to set ulimit -n 2048 to import dashboards) - I did this:

Started ES/Kibana/metricbeat
Created indexpattern - metricbeat manually
Changed the popularity of the field: system.process.name to 15 and changed the field format of system.process.cpu.start_time to string
Created 2 saved objects - 1 visualization and 1 dashboard
Imported dashboards using ./scripts/import_dashboards
Kibana didn't create a duplicate index - pattern. There is only one metricbeat index pattern in management
But it overrode the field settins for system.process.name and system.process.cpu.start_time to 0 and date
My saved objects - 1 viz and 1 dashboard are fine

tylersmalley commented 6 years ago

We could handle this on the dashboard import API, and when an index pattern doesn't exist we can see if something else with the same pattern exists. The one caveat is you can't search for it if it contains a dash. We might have to bring down all of them and search the array.

ghost commented 5 years ago

Not sure if it falls under this bug, but I'm seeing the same pattern duplication with 6.3 writing to /api/saved_objects/index-pattern/. If I run the POST multiple times, I'll get multiple copies of the same index.

Example CURL:

curl -X "POST" "http://localhost:5601/api/saved_objects/index-pattern/" \
     -H 'Content-type: application/json' \
     -H 'kbn-xsrf: true' \
     -d $'{
  "attributes": {
    "fields": "[{\\"name\\":\\"jenkins_timestamp\\",\\"type\\":\\"string\\",\\"count\\":0,\\"scripted\\":false,\\"searchable\\":true,\\"aggregatable\\":true,\\"readFromDocValues\\":true}]",
    "timeFieldName": "@timestamp",
    "title": "jenkins-master-*",
    "fieldFormatMap": "{\\"@timestamp\\":{\\"id\\":\\"date\\"}}"
  }
}'

Will create a new copy of jenkins-master-* each run. Maybe my content is malformed?

epixa commented 5 years ago

@elastic/kibana-platform

BobBlank12 commented 4 years ago

Has there been a workaround identified for when this happens? How do we know which index pattern to delete? This is 7.4.0

arkel-s commented 4 years ago

I have an ugly workaround for that, that I used in a space with many duplicated dashboards:

To avoid issues, do a backup: On Kibana/Saved objects, export all you objects as Json
Look for the duplicated patterns ID, by looking a the URL when clicking on them, in Kibana/Index patterns
Note somewhere the correct index Index ID you want to keep, and the ones you want to delete:
Delete all duplicated index:
Again, export all your objects as Json, like step 1. This time, the duplicated index won't be part of the Json.
Do a search and replace in your Json file, to keep only one index
Import your modified Json

Worked for me on Kibana 7.6.2. If you have a better solution, I'be happy to know :-) Cheers

afharo commented 3 years ago

I just tested it on kibana@master and metricbeats@7.13-SNAPSHOT, and the error is still occurring.

Digging through Beats code, it looks like the index pattern is created with an enforced ID, matching the index template's name (i.e.: id: metricbeat-*). This means that Beats will never create duplicated index-patterns themselves when importing multiple times/across versions because they are enforcing the ID.

Then, it calls Kibana's API POST /api/kibana/dashboards/import that, basically calls soClient.bulkCreate.

I wonder what's the best way to fix this issue:

Have the process to search for a matching metricbeat-* index template name and...
1. Skip its creation?
  - We might be missing fields if the index-pattern is not up-to-date.
  - Is refresh of the fields enough?
  - Any other possible side effects?
2. Update the existing index-template? => Is it safe to do so? Any potential data loss?
Fix index-template manual creations to have IDs matching their names. We have a limit in Kibana that doesn't allow users to create 2 index-templates with the same name, so it might make sense. However, it could affect the Copy through spaces.

What do you think? @elastic/kibana-core?

pgayvallet commented 3 years ago

What do you think?

Now that we added upsert to update, can't we just leverage that?

Or why not just create with override:true? Unless I'm missing something.

bhavyarm commented 3 years ago

I totally forgot that I logged this but happy to see this getting attention. cc @LeeDr

afharo commented 3 years ago

What do you think?

Now that we added upsert to update, can't we just leverage that?

Or why not just create with override:true? Unless I'm missing something.

@pgayvallet AFAIK, that API is already using create with override. The problem in this issue is that the user creates the index pattern manually (which has a random ID), and then Beats will import the SO with the fixed ID, so neither create nor upsert will attempt to modify the manually created index patter (just because they have different IDs).

I hope it makes sense.

afharo commented 2 years ago

cc @mattkime as he's the expert in Index Patterns and was looking at this issue.

elasticmachine commented 2 years ago

Pinging @elastic/kibana-app-services (Team:AppServices)

mattkime commented 2 years ago

Allowing duplicate index patterns to be created can be treated like a bug. Placing this on our roadmap.

michaelhyatt commented 2 years ago

Are there any plans to fix it? It will be good to have some sort of resolution provided?

mattkime commented 2 years ago

In reading back over this issue (and in contradiction with my earlier statement) this strikes me as an inherent flaw of importing saved objects. This doesn't strike me as a data view problem so much as a saved object problem. That said, maybe I don't have an accurate definition of the problem and its potential resolutions.

At very least this issue has been around long enough that the same problem is being referenced across different APIs.

@michaelhyatt I know you're working with customers - what kind of resolution would they want to see?

michaelhyatt commented 2 years ago

@mattkime just came off a meeting with a customer that was asking for the fix for this. At the moment, they need to go and delete the objects before migrating their artifacts from other environments and this process is quite error-prone and flaky. So, their ideal solution will be to find a way to identify duplicated saved objects and overwrite them during the import.

mattkime commented 2 years ago

@michaelhyatt Do you know which method they're using to import saved objects?

michaelhyatt commented 2 years ago

@mattkime I am still trying to get a hold of them to confirm, but from memory they were using import saved objects option in Kibana.

pavelkostyurin commented 2 years ago

Any news on this issue?

rudolf commented 2 years ago

@mattkime It sounds to me like what users are intuitively expecting is that there is a unique constraint on data view title/"index pattern" field. So that if they have a data view {"_id": "1", "title": "apache-server-logs*"} and then import another data view {"_id": "2", "title": "apache-server-logs*"} Kibana would say "You already have an existing apache-server-logs* data view would you like to override this one or cancel the import?

From my own experience it's frustrating to see two data views with the same title because there's no way to distinguish between them in the UI. We have this on some of our internal clusters.

We could simulate a unique data view constraint using the import hooks https://github.com/elastic/kibana/blob/doclinks-max-open-shards/src/core/server/saved_objects/types.ts#L468 which could show a warning message and give users a UI to resolve duplicate data view titles. If users choose to delete one of the data views, then all existing dashboards would have to switch to the other data view.

Another angle that doesn't solve the core problem but might help would be to have a description field shown in the data view selector widget. That way a user could perhaps distinguish between "apache-server-logs (filebeat default data view)" and "apache-server-logs"

With sharing to multiple spaces this problem is probably going to get worse as when a user shares a dashboard to all other spaces, they would have to share the data view too. If filebeat creates data views that are already shared across all spaces that would help somewhat but when users create their own data views they wouldn't by default create them across all spaces.

Saved objects are working as expected, it's a domain agnostic database that doesn't understand how users expect their data views to act. I believe the onImport hook would be a sufficient solution to address this, but let us know if you think you need anything more from Core.

mattkime commented 2 years ago

@rudolf The import hook might work - I guess my concern is the following

There's an existing data view with a given title and SOs that reference it. Another data view is imported with the same title but a different id and other SOs are being imported that reference it. Will this import hook resolve that situation in a coherent manner? My main concern is that the user has a relatively complete understanding of the problem and similar completeness in resolving it.

rudolf commented 2 years ago

@mattkime I don't think we could automatically resolve the duplicates. In some cases it might be possible to merge the fields but there could be different formatting / runtime fields. So the user would have to manually choose, and potentially fix the dashboards that are now using a slightly different data view than what they used to.

afharo commented 2 years ago

@mattkime Q: would it help if the title is an actual descriptive piece of text instead of matching the Data View's matching index pattern?

IMO, it'll help in 2 ways:

As a user, I can manually create multiple Data Views, matching the same index pattern with different configurations (formatting, selecting different time fields). Right now, if you want to do that in the UI, you need to add trailing * to your index pattern so it would allow you to save.
When finding out duplicates, we can append some info to the duplicate (either a simple counter like when you download files with the same name from the browser or more info like " - Imported/Copied from X on YYYY-MM-DD")

Given that there are many reasons pointing out that we cannot automagically merge them and ensure there's only 1, I think this approach could, at least, help users with uniquely identifying the duplicates in the UI.

What do you think?

N.B.: I think https://github.com/elastic/kibana/pull/124191 goes in that line

linyaru commented 2 years ago

When finding out duplicates, we can append some info to the duplicate

This is a good suggestion, especially the concept of adding "time created" to each data view. Quite often, the first and original data view is the one that most users will want to use in their data exploration and visualization needs, and seeing the "import/copied from" information can also help users avoid create saved object dependencies on these data views.

Additionally (not sure if this is right place), but I'd like to provide some perspective into the process of managing saved object dependencies and de-duplicating data views as a Kibana administrator. 3 out of ~20 data views (our cluster is still on 7.17) have duplicates that were created in importing dashboards; these are identical views with same time fields. There are five all named kpi-infra-allocators and this provides an extremely confusing user experience when choosing what data source to use:

IMO the process of sorting this out (which is possible via the Saved Objects API) is convoluted and worthy of documentation for enablement purposes:

Using Find Objects API, determine the number of saved objects that have dependencies on each data view with the same name:
Determine which data view will be the "source of truth" whose ID will replace the IDs of all other identically titled data views - for us, since they are all identical, this was simply the one that had more existing object dependencies.
For each data view that will be replaced, use Export Objects API to export all of the following artifacts: dashboards, visualizations, lens', searches, maps.
In exported objects NDJSON, replace duplicate data view IDs with target ID identified in 2.
Use Import Objects API, import modified objects, at first with overwrite=false to check whether any objects have errors (ex. sometimes exported objects contain reference errors that need to fixed before import is possible).
Import modified objects with overwrite=true
Check to see that duplicate data views no longer have saved objects dependencies (this can be done via API or in Saved Objects UI in Kibana) and then delete data views.

P.S. I used to do this all in the Saved Objects UI alone but @afharo inspired me to find a more efficient (!) way.

elasticmachine commented 1 year ago

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

devkSerge commented 2 months ago

We are having the same issue, when importing saved objects: Screenshot from 2024-04-09 14-11-17

elastic / kibana

Importing saved objects creates duplicate index pattern when the index pattern with the same name already exists in Kibana #13649