Standard Energy Efficiency Data (SEED) Platform™ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
To import a file of 500 records with matching fields, but different "notes" took 90 seconds. After improvements, importing took 16 seconds.
What's this PR do?
Moves a db query outside of a loop - saved 25 seconds
Chunks incoming data and runs task match_and_link_incoming_properties_and_taxlots_by_cycle in parallel chunks, using the number of celery workers to determine the number of parallel tasks. Results are aggregated at the conclusion of all tasks. With 5 tasks, all tasks complete within 16 seconds.
Parallelizing the entire match_and_merge task breaks when duplicate properties exist in an import file. We need to be more precise if we are to use parallel tasks
Any background context you want to provide?
To import a file of 500 records with matching fields, but different "notes" took 90 seconds. After improvements, importing took 16 seconds.
What's this PR do?
match_and_link_incoming_properties_and_taxlots_by_cycle
in parallel chunks, using the number of celery workers to determine the number of parallel tasks. Results are aggregated at the conclusion of all tasks. With 5 tasks, all tasks complete within 16 seconds.How should this be manually tested?
upload files, monitor flower
What are the relevant tickets?
Screenshots (if appropriate)