Optimize search queries with minimal hardware.

sidharthramesh commented 4 years ago

Each query, including looking up by concept code takes ~1 second or more. The minimum response time I was able to get with an 8GB RAM, 4 core dedicated Linux machine running snowstorm was on average 600ms, which is still a lot. Is this the expected performance? Are there any ways or best practices to optimize the speed of search?

Also, how does caching work exactly? I see snowstorm output to the console "Caches are hot", but the search speeds for repeated searches are still the same.

kaicode commented 4 years ago

Hi @sidharthramesh,

Our public instance is currently using a 8G 2 core linux machine for the Snowstorm application but uses a cluster of two 8g nodes on AWS Elasticsearch. On the public instance the average fetch time for the minimal format concept is around 140ms: https://browser.ihtsdotools.org/snowstorm/snomed-ct/MAIN/2020-03-09/concepts/80631005 For a full concept with all descriptions, axioms and relationships is around 200ms: https://browser.ihtsdotools.org/snowstorm/snomed-ct/browser/MAIN/2020-03-09/concepts/80631005

We are using Elasticsearch on AWS because of the automated backups and easy management for our DevOps team.

I usually find that hosting a single Elasticsearch node on the same machine as Snowstorm makes the API requests 1.5 - 2 times faster. The main thing to ensure is that Elasticsearch has enough memory. If you have a single machine with 8G I would give Elasticsearch 3G and Snowstorm 2G. Leaving 3G ram free on the machine is recommended by the Elasticsearch team because it will be used by OS level disk caching to get the best performance from Elasticsearch.

Costly requests

4 sec: GET http://localhost:9200/description/description/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512

{
    "from": 0,
    "query": {
        "bool": {
            "adjust_pure_negative": true,
            "boost": 1.0,
            "filter": [
                {
                    "bool": {
                        "adjust_pure_negative": true,
                        "boost": 1.0,
                        "must": [
                            {
                                "bool": {
                                    "adjust_pure_negative": true,
                                    "boost": 1.0,
                                    "should": [
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ]
                                            }
                                        },
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ],
                                                "must": [
                                                    {
                                                        "term": {
                                                            "languageCode": {
                                                                "boost": 1.0,
                                                                "value": "no"
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        },
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ],
                                                "must": [
                                                    {
                                                        "term": {
                                                            "languageCode": {
                                                                "boost": 1.0,
                                                                "value": "fi"
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        },
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ],
                                                "must": [
                                                    {
                                                        "term": {
                                                            "languageCode": {
                                                                "boost": 1.0,
                                                                "value": "sv"
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        },
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ],
                                                "must": [
                                                    {
                                                        "term": {
                                                            "languageCode": {
                                                                "boost": 1.0,
                                                                "value": "fr"
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        },
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ],
                                                "must": [
                                                    {
                                                        "term": {
                                                            "languageCode": {
                                                                "boost": 1.0,
                                                                "value": "da"
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        },
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "filter": [
                                                    {
                                                        "simple_query_string": {
                                                            "analyze_wildcard": false,
                                                            "boost": 1.0,
                                                            "default_operator": "and",
                                                            "fields": [
                                                                "termFolded^1.0"
                                                            ],
                                                            "flags": -1,
                                                            "query": "cat*"
                                                        }
                                                    }
                                                ],
                                                "must": [
                                                    {
                                                        "term": {
                                                            "languageCode": {
                                                                "boost": 1.0,
                                                                "value": "es"
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ],
            "must": [
                {
                    "bool": {
                        "adjust_pure_negative": true,
                        "boost": 1.0,
                        "must": [
                            {
                                "bool": {
                                    "adjust_pure_negative": true,
                                    "boost": 1.0,
                                    "must": [
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "should": [
                                                    {
                                                        "bool": {
                                                            "adjust_pure_negative": true,
                                                            "boost": 1.0,
                                                            "must": [
                                                                {
                                                                    "term": {
                                                                        "path": {
                                                                            "boost": 1.0,
                                                                            "value": "MAIN"
                                                                        }
                                                                    }
                                                                },
                                                                {
                                                                    "range": {
                                                                        "start": {
                                                                            "boost": 1.0,
                                                                            "from": null,
                                                                            "include_lower": true,
                                                                            "include_upper": true,
                                                                            "to": 1584685935588
                                                                        }
                                                                    }
                                                                }
                                                            ],
                                                            "must_not": [
                                                                {
                                                                    "exists": {
                                                                        "boost": 1.0,
                                                                        "field": "end"
                                                                    }
                                                                }
                                                            ]
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 10000,
    "sort": [
        {
            "termLen": {
                "order": "asc"
            }
        },
        {
            "_score": {
                "order": "asc"
            }
        }
    ],
    "stored_fields": [
        "descriptionId",
        "conceptId"
    ]
}

2 secs: GET http://localhost:9200/member/referencesetmember/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512

{
    "from": 0,
    "query": {
        "bool": {
            "adjust_pure_negative": true,
            "boost": 1.0,
            "must": [
                {
                    "bool": {
                        "adjust_pure_negative": true,
                        "boost": 1.0,
                        "must": [
                            {
                                "bool": {
                                    "adjust_pure_negative": true,
                                    "boost": 1.0,
                                    "must": [
                                        {
                                            "bool": {
                                                "adjust_pure_negative": true,
                                                "boost": 1.0,
                                                "should": [
                                                    {
                                                        "bool": {
                                                            "adjust_pure_negative": true,
                                                            "boost": 1.0,
                                                            "must": [
                                                                {
                                                                    "term": {
                                                                        "path": {
                                                                            "boost": 1.0,
                                                                            "value": "MAIN"
                                                                        }
                                                                    }
                                                                },
                                                                {
                                                                    "range": {
                                                                        "start": {
                                                                            "boost": 1.0,
                                                                            "from": null,
                                                                            "include_lower": true,
                                                                            "include_upper": true,
                                                                            "to": 1584685935588
                                                                        }
                                                                    }
                                                                }
                                                            ],
                                                            "must_not": [
                                                                {
                                                                    "exists": {
                                                                        "boost": 1.0,
                                                                        "field": "end"
                                                                    }
                                                                }
                                                            ]
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                },
                {
                    "terms": {
                        "additionalFields.acceptabilityId": [
                            "900000000000548007",
                            "900000000000549004"
                        ],
                        "boost": 1.0
                    }
                },
                {
                    "terms": {
                        "boost": 1.0,
                        "conceptId": [
                            "60231008",
                            "422860002",
                            "782515007",
                            "726762008",
                            "62795009",
                            "217701002",
                            "253253009",
                            "46540009",
                            "423247009",
                            "37473008",
                            "86714001",
                            "128306009",
                            "90268004",
                            "19923001",
                            "33384004",
                            "17738004",
                            "396747005",
                            "63129006",
                            "54988005",
                            "388623001",
                            "100141008",
                            "14060003",
                            "85491003",
                            "386051003",
                            "77477000",
                            "282673009",
                            "61698003",
                            "96257008",
                            "256425001",
                            "257528009",
                            "425154009",
                            "79058000",
                            "157937004",
                            "63852007",
                            "275281000",
                            "24275002",
                            "388618001",
                            "423717008",
                            "266383007",
                            "193570009",
                            "41932008",
                            "409920005",
                            "204259006",
                            "31046007",
                            "30623001",
                            "388626009",
                            "23826000",
                            "227043000",
                            "155126003",
                            "155521003"
                        ]
                    }
                }
            ]
        }
    },
    "size": 10000
}

990 ms: GET http://localhost:9200/concept/concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512

{
    "_source": {
        "excludes": [],
        "includes": [
            "conceptId"
        ]
    },
    "from": 0,
    "post_filter": {
        "terms": {
            "boost": 1.0,
            "conceptId": [
                257528009,
                388618001,
                33384004,
                388623001,
                388626009,
                23826000,
                30623001,
                46540009,
                63852007,
                90268004,
                79058000,
                31046007,
                24275002,
                266383007,
                204259006,
                256425001,
                253253009,
                155521003,
                227043000,
                275281000,
                282673009,
                425154009,
                396747005,
                63129006,
                86714001,
                100141008,
                96257008,
                17738004,
                41932008,
                54988005,
                726762008,
                782515007,
                157937004,
                155126003,
                193570009,
                217701002,
...too long to post

kaicode commented 4 years ago

Docker is really killing your performance there.

In the first Elasticsearch query, which is against the description index, there is a clause for each language that is configured with special character folding. This is to allow the search to work as expected in multiple languages. If you are only interested in English (or any language which does not need this character folding feature) you could try removing everything starting 'search.language' from the configuration in this section of your local instance to see if that helps at all: application.properties Search International Character Handling This should simplify the first query but it's unlikely that will speed things up much.

The cost usually comes from the number of times a request is made to Elasticsearch. The design of the Snowstorm indices allow for:

Maintenance of SNOMED CT using authoring branches or sandboxes
Holding multiple Editions/Extensions of SCT without duplicating content
Holding multiple releases of SCT without duplicating content
All API content operations work in multiple languages

For this reason the information for each concept is not denormalised into a single Elasticsearch document, it is spread over several indices in a similar way to the RF2 distribution files of SNOMED CT. This means that to fulfil a search request many requests need to be made to match a description and then gather all the information returned. The member index request is needed to work out which description is the FSN and PT for the matched concept in the language you have requested.

You could also try the description endpoint: https://browser.ihtsdotools.org/snowstorm/snomed-ct/MAIN/descriptions?term=cat

If you need something superfast but very simple in a single language for a single release of SNOMED CT I would consider using something which uses Lucene directly. There is a starter project which you may be interested in, it's something I created before Snowstorm, it's not really in active maintenance at the moment: SNOMED Query Service It's not mentioned in the readme but the REST API can accept a term parameter.

I hope that helps. Kind regards, Kai

kaicode commented 4 years ago

Good luck with your project! 😄

sidharthramesh commented 4 years ago

Thank you! I am now working on using elasticsearch directly so as to reduce the time taken per search. This is a great project when considering the ECL query implementation!

IHTSDO / snowstorm

Optimize search queries with minimal hardware. #118

Costly requests