FINRAOS / herd

Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.
http://finraos.github.io/herd/
Apache License 2.0
135 stars 41 forks source link

Herd-UI Search Issue #371

Open tinshuksingh opened 6 years ago

tinshuksingh commented 6 years ago

Hello Team,

I have created business object definition 'test_bus_def_1' from Swagger and now I am trying to search it from Herd-UI home page. It's giving NullPointerException.

image

Below are the logs -

nateiam commented 6 years ago

Tinshuk -

I said it before and will say it again. I love how you dig right in and get things going!

Here's the story on our search. You probably know from your knowledge of CloudFormation and install that we embed Elasticsearch behind the Herd search endpoints. We have this nicely integrated in our environments but have not yet put some settings and configurations in the open source release.

Unlike #370 with Uploader, this requires multiple configs to get it to work. I am mentioning @DavidBalash who is an expert at the Elasticsearch integration in Herd.

Here is an overview for your information -- David will share more details and good default configs and/or detailed instructions for each area:

It's worth mentioning that we have another team working on including all this config and default settings in a CloudFormation so it's easy for open source. We're just not quite there yet -- but I am sure you and David can get your instance up and running!

DavidBalash commented 6 years ago

Hi Tinshuk,

I have added the following Elasticsearch setup page to the Herd wiki on GitHub:

https://github.com/FINRAOS/herd/wiki/Elasticsearch-setup

I have also added all of the Elasticsearch configuration values to the existing configuration values page on the wiki:

https://github.com/FINRAOS/herd/wiki/configuration-values

Please let us know if this helps to solve your search issue on the Herd-UI.

zubair-nbx commented 6 years ago

Hi,

I am facing the same issue, after activating the indexes I can see 4 documents in an index see image below: herd-indexes

Here is the curl request which I copied from the "Dev Tools" in chrome:

curl -X POST \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ -d '{"searchTerm":"data_object_a","facetFields":["Tag","ResultType"],"indexSearchFilters":null,"enableHitHighlighting":true}' \ http://localhost:8080/herd-app/rest//indexSearch?fields=displayName,shortDescription&match

I am using the same version of ES as in pom file which is 5.1.1.

Here is the screen shot of Herd-UI app: herd-search-error

Please suggest/guide how to fix this. @tinshuksingh Did Search worked for you ?

nateiam commented 6 years ago

Hi @zubair-nbx, can you please send the contents of your catalina.out from this time period? This should help us troubleshoot the issue.

Thank you!

zubair-nbx commented 6 years ago

Hi @nateiam , Thanks for your response. Please see my findings Below:

script.inline: true script.stored: true script.file: true

The query generated by libs in Herd uses inline groovy script see the portion of query using script:

"script_score" : { "script" : { "inline" : "_score * (doc['_index'].value == 'bdef_1522162310906' ? doc['tagSearchScoreMultiplier']: 1)", "lang" : "groovy" }

This makes the Global search with filter 'All' works fine, but selecting filter 'Column' doesn't work still, see details below:

- Issue with ‘Column’ filter on Herd-UI homepage page global search: When filter for ‘Column’ is enabled on Herd-UI homepage global search the query generated has 3 “multi_match” subqueries with includes the “fields”: [ ] and ES is not able to execute that query:

multi_match" : { "query" : "Price", "fields" : [], "type" : "phrase_prefix", "operator" : "OR", "slop" : 0, "prefix_length" : 0, "max_expansions" : 50, "lenient" : false, "zero_terms_query" : "NONE", "boost" : 1.0 }

Also note that, this setting is read from class “ConfigurationValue.java”

'ELASTICSEARCH_HIGHLIGHT_FIELDS("elasticsearch.highlight.fields", "{\"fields\": [\"*\"]}")'

It also causes exception while creating json object. The lib fasterXml / JSON is not able to parse the object {\"fields\":[\"*\"]}` into json and throws exception on method:

private HighlightBuilder buildHighlightQuery(String preTag, String postTag, Set<String> match) in package:

herd-dao/src/main/java/org/finra/herd/dao/impl/IndexSearchDaoImpl.java

Code snippet below:

...try { @SuppressWarnings("unchecked") //Code crashes here IndexSearchHighlightFields highlightFieldsConfig = jsonHelper.unmarshallJsonToObject(IndexSearchHighlightFields.class, highlightFieldsValue); highlightFieldsConfig.getHighlightFields().forEach(highlightFieldConfig -> {.....

Hope it helps, please comment if more details is needed. Thank you :)