SEED-platform / seed

Standard Energy Efficiency Data (SEED) Platformβ„’ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
Other
111 stars 54 forks source link

Major import performance improvements #4718

Closed axelstudios closed 4 months ago

axelstudios commented 4 months ago

Any background context you want to provide?

Importing data into a large organization can take a considerable amount of time to go through the matching/merging/linking process

What's this PR do?

Optimizes several steps throughout the import process to improve performance:

Step Matching Data (3/6): Merging Unmatched States

When importing 512 records that will merge into an org with 512 records, with UBID data:

Before After
Queries 520 7
Time 1m 28s 11s

When importing 5000 records that will merge into an org with 145,000 records, without UBID data:

Before After
Time 26m 43s 15s

When importing 40,000 records that will merge into an org with 145,000 records, without UBID data:

πŸ’₯πŸ’₯πŸ’₯πŸ’₯πŸ’₯

Before After
Time ~4h 30m 30s

πŸ’₯πŸ’₯πŸ’₯πŸ’₯πŸ’₯

Step Matching Data (4/6): Merging State Pairs

When importing 512 records that will merge into an org with 512 records, with UBID data:

Before After
Queries 24,578 23,043
Time 50s 22s

When importing 5000 records that will merge into an org with 145,000 records, without UBID data:

Before After
Time 2m 35s 2m 10s

Total timing when importing 512 records that will merge into an org with 512 records, with UBID data:

Before After With Hannah's improvements With Ross's improvements
Queries 36,222 33,600 32,571 26,913
Time 1m 24s 1m 57s 48s

How should this be manually tested?

Upload files into existing organizations using develop and this branch - compare the runtime

What are the relevant tickets?

4549