IHTSDO / snowstorm-lite

Snowstorm Lite FHIR Terminology Server
Other
12 stars 2 forks source link

$expand sorting is different (worse) than the full snowstorm version #4

Closed ivank closed 6 months ago

ivank commented 8 months ago

Example:

curl --silent 'https://snowstorm.ihtsdotools.org/fhir/ValueSet/$expand?url=http://snomed.info/sct?fhir_vs&filter=Breast+Cancer&count=5' | jq

responds with:

{
  "resourceType": "ValueSet",
  "id": "5fc7dd97-888a-4385-aa05-8c2fabce0fe1",
  "url": "http://snomed.info/sct?fhir_vs",
  "status": "active",
  "copyright": "This value set includes content from SNOMED CT, which is copyright © 2002+ International Health Terminology Standards Development Organisation (SNOMED International), and distributed by agreement between SNOMED International and HL7. Implementer use of SNOMED CT is not covered by this agreement.",
  "expansion": {
    "id": "7cc5cd7b-f6ca-4602-9e76-76d2499dd01b",
    "timestamp": "2024-01-15T14:43:46+00:00",
    "total": 45,
    "offset": 0,
    "parameter": [
      {
        "name": "version",
        "valueUri": "http://snomed.info/sct|http://snomed.info/sct/900000000000207008/version/20240101"
      },
      {
        "name": "displayLanguage",
        "valueString": "en"
      }
    ],
    "contains": [
      {
        "system": "http://snomed.info/sct",
        "code": "254837009",
        "display": "Malignant tumor of breast"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "372064008",
        "display": "Malignant neoplasm of female breast"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "724451007",
        "display": "Fear of breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "134405005",
        "display": "Suspected breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "268547008",
        "display": "Screening for malignant neoplasm of breast"
      }
    ]
  }
}

Whereas local snowstorm-lite server is returning

curl -u admin:yourAdminPassword --silent 'http://localhost:8085/fhir/ValueSet/$expand?url=http://snomed.info/sct?fhir_vs&filter=Breast+Cancer&count=5' | jq
{
  "resourceType": "ValueSet",
  "url": "http://snomed.info/sct?fhir_vs",
  "name": "SNOMED CT Implicit ValueSet of all concepts.",
  "status": "active",
  "copyright": "This value set includes content from SNOMED CT, which is copyright © 2002+ International Health Terminology Standards Development Organisation (SNOMED International), and distributed by agreement between SNOMED International and HL7. Implementer use of SNOMED CT is not covered by this agreement.",
  "expansion": {
    "identifier": "1e115cd2-d887-4b14-b399-4362974939b0",
    "timestamp": "2024-01-15T14:47:43+00:00",
    "total": 50,
    "parameter": [
      {
        "name": "version",
        "valueUri": "http://snomed.info/sct|http://snomed.info/sct/900000000000207008/version/20230131"
      }
    ],
    "contains": [
      {
        "system": "http://snomed.info/sct",
        "code": "717129004",
        "display": "Claus Model"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "724451007",
        "display": "Fear of breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "134405005",
        "display": "Suspected breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "254843006",
        "display": "Familial cancer of breast"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "254837009",
        "display": "Malignant tumor of breast"
      }
    ]
  }
}

Where "Fear of breast cancer" is higher on the list than "Malignant tumor of breast" ... not the response we were expecting (and the full version was correctly returning).

Maybe we have to configure / reindex something to make it work the same as the full version?

Docker image version: snomedinternational/snowstorm:9.2.0

JonZammit commented 8 months ago

Hi @ivank just looking at the "version" parameter in the response - does your instance of Snowstorm-lite have the January 2023 version loaded?

JonZammit commented 8 months ago

Today I ran Ivan's query against my instance of snowstorm-lite which has the January 2024 edition loaded. My results were similar - 50 concepts in the expansion of this value set.

I'm not sure how the concepts are ordered in the response, but I believe the difference in numbers can be explained because snowstorm-lite includes inactive concepts in the value set. These can be filtered as they are indicated as inactive, for e.g.

            {
                "system": "http://snomed.info/sct",
                "inactive": true,
                "code": "366980001",
                "display": "Suspected breast cancer"
            }
kaicode commented 8 months ago

@ivank Thanks for reaching out. Snowstorm Lite does not use the same search mechanism as Snowstorm. The lite search is much faster but the results ranking is not as good in some cases. This is because the results ranking sorts concepts on their average description length, rather than the length of the description that matched the search query.

The relevance of the results can be improved by searching the specific area of the hierarchy you are interested in using ECL. Examples:

I hope that helps.

Long Explanation Snowstorm searches against individual descriptions, sorts them by the shortest matching description first, and then returns the unique concepts. Snowstorm Lite only has concept documents, it finds concepts that have some matching description. Sorting happens using the average description length.

kaicode commented 7 months ago

There is a fix for this in the develop branch: to apply the same sorting as the main Snowstorm product, when within the first 100 results, without a loss of performance. Initial testing looks good!

kaicode commented 6 months ago

This is fixed in the latest version 1.3.0-beta.