IHTSDO / snowstorm-x

Snowstorm-X, a beta version of Snowstorm with the latest implementation-focused features and enhancements. Please be aware these features have not yet been fully tested.
Other
4 stars 3 forks source link

NotImplementedException when retrieving offset for expanded ValueSets with more than 10k items #7

Closed rrapio closed 1 year ago

rrapio commented 1 year ago

While obtaining the entire SNOMED catalog with the Spanish extension through a ValueSet expansion based on an ECL expression, when I reach the page (offset) that contains the 10,000th element or posterior, I get the following error:

Caused by: org.apache.commons.lang3.NotImplementedException: searchAfter in ConceptSelector must be accompanied with a sort order
    at org.snomed.snowstorm.ecl.ConceptSelectorHelper.fetchIds(ConceptSelectorHelper.java:137)
    at org.snomed.snowstorm.ecl.ConceptSelectorHelper.select(ConceptSelectorHelper.java:63)
    at org.snomed.snowstorm.ecl.domain.expressionconstraint.SSubExpressionConstraint.select(SSubExpressionConstraint.java:53)
    at org.snomed.snowstorm.ecl.ECLQueryService.doSelectConceptIds(ECLQueryService.java:166)
    at org.snomed.snowstorm.ecl.ECLQueryService.selectConceptIds(ECLQueryService.java:89)
    at org.snomed.snowstorm.ecl.ECLQueryService.selectConceptIds(ECLQueryService.java:75)
    at org.snomed.snowstorm.core.data.services.QueryService.doEclSearchAndConceptPropertyFilters(QueryService.java:339)
    at org.snomed.snowstorm.core.data.services.QueryService.doSearchForIds(QueryService.java:203)
    at org.snomed.snowstorm.core.data.services.QueryService.searchForIds(QueryService.java:167)
    at org.snomed.snowstorm.core.data.services.QueryService.searchForIds(QueryService.java:163)
    at org.snomed.snowstorm.fhir.services.FHIRValueSetService.expand(FHIRValueSetService.java:243)
    at org.snomed.snowstorm.fhir.services.FHIRValueSetProvider.expandType(FHIRValueSetProvider.java:248)

The URL invoked in that case:

http://localhost:8095/fhir/ValueSet/$expand?includeDesignations=true&url=http%3A%2F%2Fsnomed.info%2Fsct%2F449081005%2Fversion%2F20221031%3Ffhir_vs%3Decl%2F%3C138875005&count=500&offset=10000

A small fix I did to continue my work was defining the default sort order during the instantiation of the largePageRequest object that is used for the search after strategy, implemented there to circumvent the Elasticsearch limitation of 10K items per search.

The change I made:

FhirValueSetService.java:238

largePageRequest = LARGE_PAGE.withSort(DEFAULT_SORT);

I don't know if that works for every scenario, but apparently worked for my use case.

kaicode commented 1 year ago

Thank you @rrapio for raising this bug and for suggesting a fix. Setting a default sort order is how I would have fixed this. I will get that added to the code for the next release. I am currently bringing SnowstormX up to date with the latest Snowstorm release and running through some test scenarios in preparation for the FHIR multiple code systems feature to be merged into the main Snowstorm project for the March release. This bug ticket has perfect timing!

rrapio commented 1 year ago

Great, glad to know this is the proper fix. Looking forward to the next release!

kaicode commented 1 year ago

Hi @rrapio, I have taken a closer look at this and the default sort order is not the best approach. The pages before the 10K offset use a different sort order which must be kept to prevent the possibility of seeing the same concepts again in the pages after 10K. I think something like this would be better:

largePageRequest = PageRequest.of(0, LARGE_PAGE.getPageSize(), pageRequest.getSort());

I'll test this solution and get the fix into the coming Snowstorm release.

rrapio commented 1 year ago

Thank you for the correction @kaicode! I will fix my version here too in the meantime.

rrapio commented 1 year ago

This bug also affects non-SNOMED terminologies (i.e. LOINC). Unfortunately, the fix we discussed here doesn't apply, since it is in the isSnomed part of the algorithm.

kaicode commented 1 year ago

This is fixed in SnowstormX release 8.3.0 for SNOMED CT and other code systems.