IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

findConcepts API returns items in not-supported language #97

Closed desjob closed 4 years ago

desjob commented 4 years ago

It looks like the Accept-Language header is not correctly applied on the findConcepts API. I'm using Snowstorm version 4.5.1 with the dutch dataset loaded.

Reproduction steps: (replace {HOSTNAME} with snowstorm hostname)

curl -X GET --header 'Accept: application/json' --header 'Accept-Language: nl-NL' '{HOSTNAME}/MAIN/concepts?activeFilter=true&term=Entire%20deltoid%20bursa%20(body%20structure)&offset=0&limit=50'

Expected result (no hits, since no dutch values match the search term)

{
  "items": [],
  "total": 0,
  "limit": 50,
  "offset": 0
}

Actual result

{
  "items": [
    {
      "conceptId": "368530006",
      "effectiveTime": "20020131",
      "moduleId": "900000000000207008",
      "active": true,
      "pt": {
        "term": "Entire deltoid bursa",
        "lang": "en"
      },
      "definitionStatus": "PRIMITIVE",
      "fsn": {
        "term": "Entire deltoid bursa (body structure)",
        "lang": "en"
      },
      "id": "368530006"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0,
  "searchAfter": "WzM2ODUzMDAwNl0=",
  "searchAfterArray": [
    368530006
  ]
}
kaicode commented 4 years ago

Hi @desjob,

Thanks for reaching out. I can understand the mismatch between what you are requesting from the API and what you are getting back. This is because the English language is added automatically as an option if it is not given in the Accept Language header values. This is documented on the descriptions endpoint but this documentation has not been copied to all the places where it's applicable.

Implementation Notes

The Accept-Language header is used to specify the user's preferred language, 'en' is always added as a fallback if not already included in the list. Each language is used as an optional clause for matching and will include the correct character folding behaviour for that language. The Accept-Language header list is also used to choose the best translated FSN and PT values in the response.

If you would like to only match descriptions in the NL module you could use the description search endpoint which includes a module filter. That endpoint also allows you to explicitly only search the "nl" language using the language parameter. This is in contrast to the behaviour of the Accept-Language header method. I'm sorry if this is confusing, part of the reason is compatibility with a legacy system which Snowstorm replaced. We will consider additional documentation to make this clearer.

I've just added the language parameter to the concept search in the develop branch too.

I hope this helps.

Kind regards, Kai

desjob commented 4 years ago

Thanks for the quick reply. We are currently also applying an ECL as part of our call to the findConcepts API, so it looks like switching to the descriptions API is not an option at this time. I'll keep an eye out for the next release so we can use the language filter on the findConcepts API when it becomes available. Any rough ETA of the next release?

fadam-mm commented 4 years ago

I've just added the language parameter to the concept search in the develop branch too.

Will this have any affect on the language reference set used for selecting the PT and FSN on the findConcepts API or is https://github.com/IHTSDO/snowstorm/issues/72 still the issue to track for language reference support?

Thanks!

danka74 commented 4 years ago

FYI, this is what is returned using the SE edition (with Accept-Language: sv):

curl -X GET --header 'Accept: application/json' --header 'Accept-Language: sv' 'http://nixv1te.sos.local:8080/MAIN%2FSNOMEDCT-SE/concepts?activeFilter=true&term=Entire%20deltoid%20bursa&offset=0&limit=50'
"items": [
    {
      "conceptId": "368530006",
      "effectiveTime": "20020131",
      "moduleId": "900000000000207008",
      "active": true,
      "pt": {
        "term": "bursa subdeltoidea, som helhet",
        "lang": "sv"
      },
      "definitionStatus": "PRIMITIVE",
      "fsn": {
        "term": "Entire deltoid bursa (body structure)",
        "lang": "en"
      },
      "id": "368530006"
    }
  ]
danka74 commented 4 years ago

This (or at least a similar) issue is also discussed on the SNOMED on FHIR calls: https://confluence.ihtsdotools.org/display/FHIR/Mechanisms+for+working+with+Languages

Happy to have more input from non-English language speakers!

kaicode commented 4 years ago

The search endpoints are still going through changes and testing as we prepare to use Snowstorm as the authoring terminology server for extension maintenance in the IHTSDO Managed Service. The next release will probably be in a couple of weeks. Cheers.

danka74 commented 4 years ago

If you need testers, just let me know when.

kaicode commented 4 years ago

This issue is solved in Snowstorm release 4.7.1.

The concept search endpoint GET /{branch}/concepts now includes a multi-value language parameter which is combined with the term parameter to restrict the concepts matched by finding description terms in a specific language.

Other selection criteria parameters have also been added such as preferredIn, acceptableIn and preferredOrAcceptableIn. These can be used to require matched descriptions to be preferred or acceptable in a specific language reference set.

The Accept-language request header can be used to control which language or language reference set is used to select the Prefered Term included in the response. This has no effect on which concepts are included in the results, just which FSN/PT to return.

Kind regards, Kai