OpenConceptLab / ocl_issues

Issues for all OCL repos. NOTE: Install ZenHub Browser Extension and request access to the OCL Roadmap board to view all issues and to contribute
4 stars 2 forks source link

Speed up IMAP import to under 2min #1926

Open paynejd opened 1 month ago

paynejd commented 1 month ago

Importing Uganda's indicator mappings (IMAP) in production is taking ~8min. It's possible that this is a result of the update to OCL's import service and the atypical IMAP import. Previously, we did a lot of work to get these imports to take place in under 1 or 2 minutes.

Acceptance Criteria

@paynejd to add info on how to test this

paynejd commented 1 month ago

Testing approach:

  1. Pull repo: https://github.com/pepfar-datim/DATIM-OCL-Scripts
  2. Download this sample import file: UGA-DAA-FY24.json
  3. Import it: python imapimport.py --env=production -cTEST -pDAA-FY24 -t[your-api-token-here] UGA-DAA-FY24.json
  4. Export it: python imapexport.py --env=production -cTEST -pDAA-FY24 -fJSON -t[your-api-token-here] > UGA-DAA-FY24.json

Note:

snyaggarwal commented 1 month ago

@paynejd I tried the import on Staging (and local) and it took 215 seconds.

snyaggarwal commented 1 month ago

@paynejd The way to add tests for this would be:

  1. Run a test for import and export for a sample of 5 (maybe 3?) times
  2. Record import and export time separately and together -- Maybe in a CSV or similar
  3. Assert these times against an "expected time" (already recorded).
  4. If the time in tests is lower than "expected time", then update "expected time" with this run's time. So that next run is asserted against this new lower time.
  5. Fail the tests if it goes beyond (significantly!) the "expected time".
  6. These tests will be in the DATIM-OCL-Scripts repo, so clone/pull is not needed.
  7. It can be added against GitHub actions which can run periodically and after every change.
  8. This will not require any deployment of this repo, as it will be only the addition of tests.
snyaggarwal commented 1 month ago

@paynejd With our latest deployment on Prod -- the import is taking around ~224.46291 secs. The time distribution is: "concept": 5.0442540645599365, "mapping": 50.38385057449341, "reference": 70.46909356117249

Rest of the time (~100 seconds) is in Repo Versions creation. Will profile that.