IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

Elasticsearch error : Limit of total fields [1000] in index [] has been exceeded #510

Open adarsh-M-agrawal opened 1 year ago

adarsh-M-agrawal commented 1 year ago

Hi I am currently in the process of importing the International Version of ICD-10 as well as hl7.terminology.r4@3.1.0. However, I have encountered an error related to the elasticSearch. how much field limit to be set

2023-04-26 14:47:42.668  INFO 18034 --- [nio-8088-exec-5] o.s.s.fhir.services.FHIRConceptService   : Saving 11539 'hl7.org-fhir-sid-icd-10' fhir concepts. All properties: [parent, inclusion, coding-hint, note, modifierlink, exclusion, preferredLong, definition, text, footnote, introduction, child]
2023-04-26 14:47:45.617 ERROR 18034 --- [nio-8088-exec-5] c.u.f.r.s.i.ExceptionHandlingInterceptor : Failure during REST processing

ca.uhn.fhir.rest.server.exceptions.InternalErrorException: Failed to call access method: org.springframework.data.elasticsearch.BulkFailureException: Bulk operation has failures. Use ElasticsearchException.getFailedDocuments() for detailed messages [{oF7avIcBfSHfE_60tYPe=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception,

reason=Limit of total fields [1000] has been exceeded]], gV7avIcBfSHfE_60tYTi=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], 

817avIcBfSHfE_60tYTj=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], K17avIcBfSHfE_60tYTf=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], LV7avIcBfSHfE_60tYTf=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], 

S17avIcBfSHfE_60tYTg=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], Fl7avIcBfSHfE_60tYXk=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], 
1l7avIcBfSHfE_60tYTj=ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]], E17

snowstorm version - 8.1.0 elasticsearch version- 7.10

kaicode commented 1 year ago

Sorry for the slow response, I have been away. I would like to reproduce the error that you are seeing to investigate further. How can I do that?

kaicode commented 1 year ago

It looks like you may be importing ICD-10 as a package from the FHIR package repository? https://registry.fhir.org/package/fhir.tx.support.r4%7C0.19.0?query=ICD_10_EN&clickedId=651245

Please instead import the ICD-10 ClaML file available from WHO via the classifications download page, this requires a login.

gauravvaishnav17 commented 1 year ago

I encountered a similar problem when importing the HL7 terminology package on centos 7. However, I resolved it by adjusting the FHIR-concept index size to 2000

adarsh-M-agrawal commented 1 year ago

Kai, thank you for your response. Despite utilizing the recommended ICD package, I'm encountering the same problem. Although I was able to successfully upload the Snomed CT international RF2 and Loinc_2.72 packages, any subsequent attempts to upload either the ICD-10 or hl7 terminology npm package result in an error from Elastic Search, stating that the total field limit [1000] has been exceeded. It's worth mentioning that I'm currently working on CentOS 7

java.lang.IllegalArgumentException: Limit of total fields [1000] has been exceeded at org.elasticsearch.index.mapper.MappingLookup.checkFieldLimit(MappingLookup.java:170) ~[elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.index.mapper.MappingLookup.checkLimits(MappingLookup.java:162) ~[elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.index.mapper.DocumentMapper.validate(DocumentMapper.java:297) ~[elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:476) ~[elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:421) ~[elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:361) ~[elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:292) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:175) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:220) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:126) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:85) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:179) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-7.10.0.jar:7.10.0] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.10.0.jar:7.10.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] at java.lang.Thread.run(Thread.java:832) [?:?]

kaicode commented 1 year ago

@adarsh-M-agrawal could you provide the full stack trace in a gist please? The part you have provided doesn't tell me which methods within Snowstorm were running at the time.

adarsh-M-agrawal commented 1 year ago

Thanks, kai for your response, these are stack traces of snowstorm and elastic search Snowstrom=> https://gist.github.com/adarsh-M-agrawal/13310e75df2249340948b9005f2fe734/raw/83dc998321538e2a4e99c78c69acc7296a578820/gistfile1.txt elastic search=> https://gist.github.com/adarsh-M-agrawal/6b55d9e041aa59afc55b359789521847/raw/7afc858d77c65167bd14db1fe1425525ea9e8067/gistfile1.txt

kaicode commented 1 year ago

I reproduced this issue and tested the workaround suggested by @gauravvaishnav17.

The workaround works; making this HTTP PUT request to Elasticsearch will increase the maximum fields limit to 200..

curl -XPUT localhost:9200/fhir-concept/_settings -H 'Content-Type:application/json' -d '{ "index.mapping.total_fields.limit": 2000 }'

After applying the workaround the ICD-10 and HL7 Terminology (hl7.terminology.r4@3.1.0) packages can be imported without any errors.

However, I am aware that having more than 1000 fields in the Elasticsearch index is likely to be inefficient and probably unnecessary. It would be good to investigate a Snowstorm enhancement for the future that persists these FHIR concept properties without creating so many Elasticsearch fields. Probably by combining the properties that do not need to be searchable into a single field for persistence.

adarsh-M-agrawal commented 1 year ago

I implemented the suggested solution and expanded the mapping field for the ElasticSearch index. As a result, I was able to successfully upload ICD-10 and HL7 files along with Snomed CT and LOINC files. However, when I attempted the lookup operation for certain LOINC codes, such as 'LL1162-8,' '4544-3,' and 'LP31755-9,' an error occurred stating, "Code 'LL1162-8' not found for system 'http://loinc.org/'." I encountered the same response for different LOINC codes, and this issue arose after uploading the HL7 files. Previously, the lookup operation was functioning correctly for all codes.

http://{hostname}:{port}/fhir/CodeSystem/$lookup?system=http://loinc.org&code=4544-3

{ "resourceType": "OperationOutcome", "issue": [ { "severity": "error", "code": "not-found", "diagnostics": "Code '4544-3' not found for system 'http://loinc.org/'." } ] }

kaicode commented 1 year ago

I notice there is a trailing slash within the Loinc URL in the error message you have pasted. That is not the URL for the loinc code system. I wonder if that is related to the issue you are experiencing. When you uploaded the LOINC code system the parameters should have been:

hapi-fhir-cli upload-terminology -d Loinc_2.72.zip -v r4 -t http://localhost:8080/fhir -u http://loinc.org

and not with a trailing slash like this:

hapi-fhir-cli upload-terminology -d Loinc_2.72.zip -v r4 -t http://localhost:8080/fhir -u http://loinc.org/

Loading the ICD-10 and HL7 terminology files should not have affected the Loinc content.

adarsh-M-agrawal commented 1 year ago

Hello Kai, I want to clarify that I am not using a "slash" during the upload of LOINC files. However, even without the slash, the lookup operation is still not functioning correctly. Let me explain the scenario once again. Initially, I uploaded the LOINC files using the following command:

./hapi-fhir-cli upload-terminology -d Loinc_2.72.zip -v r4 -t http://localhost:8080/fhir -u http://loinc.org

After the upload, I performed some lookup operations for LOINC codes, such as:

http://localhost:8080/fhir/CodeSystem/$lookup?system=http://loinc.org&code=21176-3

At this point, everything was working fine, and I could retrieve the desired information for every code. However, when I uploaded the HL7 file using the following command:

curl --form file=@hl7.terminology.r4-3.1.0.tgz --form resourceUrls="*" http://localhost:8080/fhir-admin/load-package

An error occurred during the upload process:

java.lang.IllegalArgumentException: Limit of total fields [1000] has been exceeded
at org.elasticsearch.index.mapper.MappingLookup.checkFieldLimit(MappingLookup.java:170) ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MappingLookup.checkLimits(MappingLookup.java:162) ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.DocumentMapper.validate(DocumentMapper.java:297) ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:476) ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:421) ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:361) ~[elasticsearch-7.10.0.jar:7.10.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:292) [elasticsearch-7.10.0.jar:7.10.0]

Following your suggestion, I increased the maximum field limit to 2000 using the command:

curl -XPUT localhost:9200/fhir-concept/_settings -H 'Content-Type:application/json' -d '{ "index.mapping.total_fields.limit": 2000 }'

After making this adjustment, the HL7 files were successfully uploaded. However, when I performed the lookup operation again for the LOINC code:

http://localhost:8080/fhir/CodeSystem/$lookup?system=http://loinc.org&code=21176-3

I received the following response:

{
  "resourceType": "OperationOutcome",
  "issue": [
    {
      "severity": "error",
      "code": "not-found",
      "diagnostics": "Code '21176-3' not found for system 'http://loinc.org'."
    }
  ]
}
adarsh-M-agrawal commented 1 year ago

Hello Kai, This is a gentle reminder regarding the issue we reported. We are currently facing a blockage in our workflow, and any assistance you can provide would be greatly appreciated. If any additional information or clarification is needed from our end, please do not hesitate to let us know. Thank you for your attention, and we look forward to hearing from you soon.

kaicode commented 1 year ago

Hi @adarsh-M-agrawal, this sounds very odd. I will attempt to reproduce this issue to get a better understanding.

kaicode commented 1 year ago

@adarsh-M-agrawal

After reviewing this more closely I don't think your LOINC import ever completed cleanly. During import the FHIR resources are created in this order:

I suspect your import failed during the second stage. Do you have any LOINC ValueSets or ConceptMaps imported?

I have written an improvement to the LOINC import process that will allow the import to be run again to clean up a partial import, however the fix did not make it into the early August release candidate. The Improvement will make it into the Snowstorm release scheduled for the end of September but I appreciate that is quite a few weeks away. Would you like me to create a fix release in the SnowstormX project with this improvement? That way you could use SnowstormX until the fix is merged into the main Snowstorm project.

adarsh-M-agrawal commented 1 year ago

Hello Kai, I successfully imported the LOINC package. After importing only the LOINC package, all APIs are functioning properly. However, the issue arises after importing the HL7 package. Consequently, only the lookup API for LOINC stops working. This indicates that the LOINC imports have been completed successfully. The problem seems to be related to the HL7 import, as some CodeSystem within HL7 may also contain LOINC codes.

Furthermore, I would greatly appreciate it if you could provide the improvements in the SnowstormX project. This will allow me to check if my issue has been resolved in that version.

someshwarMirge commented 8 months ago

Hello , I am working on a terminology server using snowstorm . I have imported Code systems and value sets for Snomed CT , LOINC , ICD-10 and HL7 . I have encountered Same Issue in lookup LOINC API that @adarsh-M-agrawal has faced along with Subsumes API . I was using version 8.1.0 first then I switched to version 9.1.0 . I have received same error message when LOINC and HL7 are combined .

LOINC lookup : API call: {{server}}/fhir/CodeSystem/$lookup?system=http://loinc.org&code=LL1162-8 Response:

{
    "resourceType": "OperationOutcome",
    "issue": [
        {
            "severity": "error",
            "code": "not-found",
            "diagnostics": "Code 'LL1162-8' not found for system 'http://loinc.org'."
        }
    ]
}

Subsumes LOINC: API call: {{server}}/fhir/CodeSystem/$subsumes?system=http://loinc.org&codeA=LP30786-5&codeB=21176-3 Response:

{
    "resourceType": "OperationOutcome",
    "issue": [
        {
            "severity": "error",
            "code": "invalid",
            "diagnostics": "Code 'LP30786-5' was not found in code system 'v3-loinc'."
        }
    ]
}

I am uploading files in following sequence :

  1. Snomed CT International Edition
  2. Snomed CT Extensions
  3. LOINC
  4. ICD-10
  5. HL7

    How can I fix this ? Is there there any problem with upload process ?

kaicode commented 3 months ago

@adarsh-M-agrawal The missing LOINC codes issue has now been resolved in #609 Could this issue be closed now?