amundsen-io / amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
https://www.amundsen.io/amundsen/
Apache License 2.0
4.37k stars 953 forks source link

Fix - Handle ES missing index edge case (bug fix) #2193

Closed mikaalanwar closed 11 months ago

mikaalanwar commented 11 months ago

Description

This pull request handles an edge case with basic search. If more than one resource_types is configured for amundsen (e.g Tables and Dashboards) and one of them have missing indices (i.e. no data got ingested for either of them in an initial run), it will crash the basic search from working for all resource_types. In a clean deployment (blank neo4j and elasticsearch), it is possible to not have data for all configured resource types at once. In such a case the search should not return any results for the resource_type with missing data instead of crashing basic search altogether. This PR addresses this problem and ensures that we trigger an elasticsearch "search" against only those resource_types which have a corresponding index/alias present in the elasticsearch database.

Motivation and Context

For a fresh deployment of amundsen, in case any of the configured resource types is missing indices in elasticsearch, it will prevent the search from running altogether. Also, the error statement tends to be a bit confusing:

[ERROR] es_proxy_v2_1.execute_multisearch_query:349 (1:Thread-48) - Failed to execute ES search queries. TransportError(N/A, 'index_not_found_exception')

As it does not mention which particular index (or alias) was missing. In my case, dashboards were missing but I was under the impression that maybe the search service was not looking in the right place because there were some indices in my elasticsearch instance already, which were searchable using other debugging tools (such as elasticvue). I came across the aforementioned issue and wanted to push a fix for this. There wasn't a corresponding open issue against this already, but there is a reference to a similar issue here.

How Has This Been Tested?

I prepared a clean environment locally by deleting all neo4j and elasticsearch data. Next, I ran an ingestion run for only one resource type (without knowing that I'd have to run it for all resource_types to actually get it working). In my case, I ingested tables only, without ingesting dashboards. And then I noticed that the advanced search (sidebar) was working but the main, basic search bar wasn't. Upon investigation, I narrowed it down and implemented this fix. Now, the main/basic search works fine even if I don't ingest dashboards.

Documentation

No updates to the documentation were needed.

CheckList