SwanseaUniversityMedical / concept-library

Concept Library
https://conceptlibrary.saildatabank.com
GNU General Public License v3.0
8 stars 2 forks source link

API concept queries sometimes return codes with trailing spaces #1431

Closed alee-x closed 1 year ago

alee-x commented 1 year ago

As title.

To recreate,

curl -X 'GET' \
  'https://conceptlibrary.saildatabank.com/api/v1/concepts/C2418/export/codes/' \
  -H 'accept: application/json' \
  -H 'X-CSRFToken: 1zdPTcw1wd1AobLc1ZOUR0WCrc2N2imHve0JGrfIh3SIcJcpDFs8cuxWEPrUcK4I'

and in the response body there is a concept of

{
    "concept_id": 2418,
    "concept_history_id": 6236,
    "concept_history_date": "2022-01-17T13:58:35.645789Z",
    "component_id": 2740,
    "component_history_id": 3774,
    "logical_type": 1,
    "codelist_id": 2740,
    "codelist_history_id": 3720,
    "id": 166343,
    "code": "Xac33 ",
    "description": "Asthma-chronic obstructive pulmonary disease overlap syndrom  ",
    "attributes": {}
  },

with a trailing space after the value of the code key.

Would it be possible for the API to do output sanitisation on returned codes? We only found this accidentally while investigating some undesirable behaviour from an application that calls the Concept Library API.

The impact of this is that any programmatically formed SQL queries that rely on the returned codes from the API and don't strip out the trailing whitespace will always return incorrect results as the database won't be able to match "Xac33" with "Xac33 " as the whitespace makes the strings fundamentally different.

ieuans commented 1 year ago

Thanks for pointing this out, looks like these codelists weren't cleaned before upload - they included unicode space separators. I've since updated our validation for codes and cleaned the existing offending codes.

The change is now live.