OpenConceptLab / ocl_issues

Issues for all OCL repos. NOTE: Install ZenHub Browser Extension and request access to the OCL Roadmap board to view all issues and to contribute
4 stars 2 forks source link

502 Bad Gateway for Large CodeSystem resources #1833

Open jamlung-ri opened 6 months ago

jamlung-ri commented 6 months ago

For FHIR CodeSystems that have more than ~250 codes such as this v3-ActCode resource, OCL times out when loading them using the fhir_imports repo actions. Is it possible for OCL to queue up large codesystems using one of OCL's asynchronous tasks?

Example error log from the fhir_imports repo:

Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v2-0354.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v2-0487.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v2-0076.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v3-CodeSystem.json. [500] b'{"error": "Server Error (500)"}' [500] b'{"error": "Server Error (500)"}'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v3-ActCode.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-presentOnAdmission.json. [500] b'{"error": "Server Error (500)"}' [500] b'{"error": "Server Error (500)"}'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v2-0003.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v3-HealthcareProviderTaxonomyHIPAA.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-conceptdomains.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-insurance-plan-type.json. [500] b'{"error": "Server Error (500)"}' [500] b'{"error": "Server Error (500)"}'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v3-ada-snodent.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v3-ObservationMethod.json. [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n' [502] b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'Failed to create /fhir/HL7-Terminology-THO/CodeSystem/HL7-CodeSystems-25July2023/CodeSystem-v2-0550.json. [400]
snyaggarwal commented 6 months ago

@jamlung-ri @rkorytkowski There are 1299 concepts in this v3-ActCode resource, I will try to profile this

snyaggarwal commented 6 months ago

The first run was failure, it crashed by DB and because of that API container as well. I did profiling with smaller set of data and there were 4 major places of time consumption:

  1. Concept create
  2. Version Create
  3. Version resources indexing (happening twice, second due to released flag)
  4. Returning the response

I focussed on 3rd first, it was seeding resources into the version in sync and hence indexing was also happening in sync. So I refactored that. Also fixed double indexing (second was async) I also refactored few things in Concept create part. I was able to run the full v3-ActCode in around ~135 seconds. I also saw another bug that the API is rendering (or calling) CodeSystemDetailSerializer.to_representation twice so that time is double here. I haven’t fixed this part yet.

Here is the time breakdown from my system:

Concepts created in 95.29309511184692 seconds Concepts created avg time 0.07335881070965891 seconds Version created in 0.6606752872467041 seconds Overall in 95.9538881778717 seconds Representtation time 19.414612531661987 Representtation time 19.702486276626587

I have applied these fixes on QA for testing @rkorytkowski @jamlung-ri

rkorytkowski commented 6 months ago

I'll run the tests. Thanks @snyaggarwal !

rkorytkowski commented 6 months ago

The request still timed out on QA https://github.com/OpenConceptLab/fhir_imports/actions/runs/9096241754/job/25001262089

I'll deploy to staging since it is more powerful and may complete under a minute.

snyaggarwal commented 6 months ago

@rkorytkowski We may have to consider timeout increase

rkorytkowski commented 6 months ago

Increased timeout to 10 minutes for now to have the content imported.

snyaggarwal commented 5 months ago

@rkorytkowski @jamlung-ri Should we close this one?

jamlung-ri commented 5 months ago

@rkorytkowski What do you think? We got through the Connectathon okay but might not have solved the underlying problem with large CodeSystems. However, if the NPM work will address this in some way, then I definitely think we could close it.