Open bhavyarm opened 6 years ago
Trying to get this going in 5.4.0 to see whats the behaviour
In 5.4.0: (you have to set ulimit -n 2048 to import dashboards) - I did this:
We could handle this on the dashboard import API, and when an index pattern doesn't exist we can see if something else with the same pattern exists. The one caveat is you can't search for it if it contains a dash. We might have to bring down all of them and search the array.
Not sure if it falls under this bug, but I'm seeing the same pattern duplication with 6.3 writing to /api/saved_objects/index-pattern/
. If I run the POST multiple times, I'll get multiple copies of the same index.
Example CURL:
curl -X "POST" "http://localhost:5601/api/saved_objects/index-pattern/" \
-H 'Content-type: application/json' \
-H 'kbn-xsrf: true' \
-d $'{
"attributes": {
"fields": "[{\\"name\\":\\"jenkins_timestamp\\",\\"type\\":\\"string\\",\\"count\\":0,\\"scripted\\":false,\\"searchable\\":true,\\"aggregatable\\":true,\\"readFromDocValues\\":true}]",
"timeFieldName": "@timestamp",
"title": "jenkins-master-*",
"fieldFormatMap": "{\\"@timestamp\\":{\\"id\\":\\"date\\"}}"
}
}'
Will create a new copy of jenkins-master-*
each run. Maybe my content is malformed?
@elastic/kibana-platform
Has there been a workaround identified for when this happens? How do we know which index pattern to delete? This is 7.4.0
I have an ugly workaround for that, that I used in a space with many duplicated dashboards:
Worked for me on Kibana 7.6.2. If you have a better solution, I'be happy to know :-) Cheers
I just tested it on kibana@master
and metricbeats@7.13-SNAPSHOT
, and the error is still occurring.
Digging through Beats code, it looks like the index pattern is created with an enforced ID, matching the index template's name (i.e.: id: metricbeat-*
). This means that Beats will never create duplicated index-patterns themselves when importing multiple times/across versions because they are enforcing the ID.
Then, it calls Kibana's API POST /api/kibana/dashboards/import
that, basically calls soClient.bulkCreate
.
I wonder what's the best way to fix this issue:
metricbeat-*
index template name and...
What do you think? @elastic/kibana-core?
What do you think?
Now that we added upsert
to update
, can't we just leverage that?
Or why not just create
with override:true
? Unless I'm missing something.
I totally forgot that I logged this but happy to see this getting attention. cc @LeeDr
What do you think?
Now that we added
upsert
toupdate
, can't we just leverage that?Or why not just
create
withoverride:true
? Unless I'm missing something.
@pgayvallet AFAIK, that API is already using create
with override
. The problem in this issue is that the user creates the index pattern manually (which has a random ID), and then Beats will import the SO with the fixed ID, so neither create nor upsert will attempt to modify the manually created index patter (just because they have different IDs).
I hope it makes sense.
cc @mattkime as he's the expert in Index Patterns and was looking at this issue.
Pinging @elastic/kibana-app-services (Team:AppServices)
Allowing duplicate index patterns to be created can be treated like a bug. Placing this on our roadmap.
Are there any plans to fix it? It will be good to have some sort of resolution provided?
In reading back over this issue (and in contradiction with my earlier statement) this strikes me as an inherent flaw of importing saved objects. This doesn't strike me as a data view problem so much as a saved object problem. That said, maybe I don't have an accurate definition of the problem and its potential resolutions.
At very least this issue has been around long enough that the same problem is being referenced across different APIs.
@michaelhyatt I know you're working with customers - what kind of resolution would they want to see?
@mattkime just came off a meeting with a customer that was asking for the fix for this. At the moment, they need to go and delete the objects before migrating their artifacts from other environments and this process is quite error-prone and flaky. So, their ideal solution will be to find a way to identify duplicated saved objects and overwrite them during the import.
@michaelhyatt Do you know which method they're using to import saved objects?
@mattkime I am still trying to get a hold of them to confirm, but from memory they were using import saved objects option in Kibana.
Any news on this issue?
@mattkime It sounds to me like what users are intuitively expecting is that there is a unique constraint on data view title/"index pattern" field. So that if they have a data view {"_id": "1", "title": "apache-server-logs*"}
and then import another data view {"_id": "2", "title": "apache-server-logs*"}
Kibana would say "You already have an existing apache-server-logs* data view would you like to override this one or cancel the import?
From my own experience it's frustrating to see two data views with the same title because there's no way to distinguish between them in the UI. We have this on some of our internal clusters.
We could simulate a unique data view constraint using the import hooks https://github.com/elastic/kibana/blob/doclinks-max-open-shards/src/core/server/saved_objects/types.ts#L468 which could show a warning message and give users a UI to resolve duplicate data view titles. If users choose to delete one of the data views, then all existing dashboards would have to switch to the other data view.
Another angle that doesn't solve the core problem but might help would be to have a description field shown in the data view selector widget. That way a user could perhaps distinguish between "apache-server-logs (filebeat default data view)" and "apache-server-logs"
With sharing to multiple spaces this problem is probably going to get worse as when a user shares a dashboard to all other spaces, they would have to share the data view too. If filebeat creates data views that are already shared across all spaces that would help somewhat but when users create their own data views they wouldn't by default create them across all spaces.
Saved objects are working as expected, it's a domain agnostic database that doesn't understand how users expect their data views to act. I believe the onImport hook would be a sufficient solution to address this, but let us know if you think you need anything more from Core.
@rudolf The import hook might work - I guess my concern is the following
There's an existing data view with a given title and SOs that reference it. Another data view is imported with the same title but a different id and other SOs are being imported that reference it. Will this import hook resolve that situation in a coherent manner? My main concern is that the user has a relatively complete understanding of the problem and similar completeness in resolving it.
@mattkime I don't think we could automatically resolve the duplicates. In some cases it might be possible to merge the fields but there could be different formatting / runtime fields. So the user would have to manually choose, and potentially fix the dashboards that are now using a slightly different data view than what they used to.
@mattkime Q: would it help if the title
is an actual descriptive piece of text instead of matching the Data View's matching index pattern?
IMO, it'll help in 2 ways:
*
to your index pattern so it would allow you to save.Given that there are many reasons pointing out that we cannot automagically merge them and ensure there's only 1, I think this approach could, at least, help users with uniquely identifying the duplicates in the UI.
What do you think?
N.B.: I think https://github.com/elastic/kibana/pull/124191 goes in that line
When finding out duplicates, we can append some info to the duplicate
This is a good suggestion, especially the concept of adding "time created" to each data view. Quite often, the first and original data view is the one that most users will want to use in their data exploration and visualization needs, and seeing the "import/copied from" information can also help users avoid create saved object dependencies on these data views.
Additionally (not sure if this is right place), but I'd like to provide some perspective into the process of managing saved object dependencies and de-duplicating data views as a Kibana administrator. 3 out of ~20 data views (our cluster is still on 7.17) have duplicates that were created in importing dashboards; these are identical views with same time fields. There are five all named kpi-infra-allocators
and this provides an extremely confusing user experience when choosing what data source to use:
IMO the process of sorting this out (which is possible via the Saved Objects API) is convoluted and worthy of documentation for enablement purposes:
Using Find Objects API, determine the number of saved objects that have dependencies on each data view with the same name:
Determine which data view will be the "source of truth" whose ID will replace the IDs of all other identically titled data views - for us, since they are all identical, this was simply the one that had more existing object dependencies.
For each data view that will be replaced, use Export Objects API to export all of the following artifacts: dashboards, visualizations, lens', searches, maps.
In exported objects NDJSON, replace duplicate data view IDs with target ID identified in 2.
Use Import Objects API, import modified objects, at first with overwrite=false
to check whether any objects have errors (ex. sometimes exported objects contain reference errors that need to fixed before import is possible).
Import modified objects with overwrite=true
Check to see that duplicate data views no longer have saved objects dependencies (this can be done via API or in Saved Objects UI in Kibana) and then delete data views.
P.S. I used to do this all in the Saved Objects UI alone but @afharo inspired me to find a more efficient (!) way.
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)
We are having the same issue, when importing saved objects:
Kibana version: 5.6.0 latest snapshot
Elasticsearch version: 5.6.0 latest snapshot
Server OS version: darwin_X86_64
Browser version: chrome latest
Browser OS version: OS X
Original install method (e.g. download page, yum, from source, etc.): snapshot build
Description of the problem including expected versus actual behavior: I manually created metricbeat index pattern, checked the data on discover and then loaded beats dashboards. This created a duplicate metricbeat-* indexpattern in Kibana.
Steps to reproduce: