ECMWFCode4Earth / vAirify

code repository for 2024 Code for Earth project #16
MIT License
1 stars 0 forks source link

Improve forecast etl performance #28

Closed amehta-scottlogic closed 1 month ago

amehta-scottlogic commented 1 month ago

Description

Performance was particularly slow because xarray lazily loads data from a dataset. We then try to access each dataset for each city which without eagerly loading is quite slow.

By explicitly calling load, we use more memory but performance improves. Introducing threads also significantly improves the runtime.

We now process all 153 in around 2 minutes instead of 10.

Default

15 cities per minutes

With eager load

25 cities per minute

With eager load and thread pool

76 cities per minute

Output

2024-05-20 17:07:46,450 - INFO - Finding data for 153 cities
2024-05-20 17:07:46,450 - INFO - Extracting pollutant forecast data
2024-05-20 17:07:46,462 - INFO - Loading data from CAMS to file single_level_2024-05-20_00.grib
2024-05-20 17:07:47,403 - INFO - Loading data from CAMS to file multi_level_2024-05-20_00.grib
2024-05-20 17:07:49,678 - INFO - Transforming forecast data
2024-05-20 17:09:00,256 - INFO - Persisting forecast data
2024-05-20 17:09:12,220 - INFO - 5049 documents upserted, 0 modified
github-actions[bot] commented 1 month ago

β˜‚οΈ Python Coverage

current status: βœ…

Overall Coverage

Lines Covered Coverage Threshold Status
266 200 75% 0% 🟒

New Files

File Coverage Status
air-quality-backend/src/database/location.py 100% 🟒
TOTAL 100% 🟒

Modified Files

File Coverage Status
air-quality-backend/src/database/air_quality_dashboard_dao.py 0% 🟒
air-quality-backend/src/etl/forecast/forecast_adapter.py 100% 🟒
air-quality-backend/src/etl/forecast/forecast_dao.py 100% 🟒
air-quality-backend/src/etl/forecast/forecast_data.py 93% 🟒
TOTAL 73% 🟒

updated for commit: 69e77c2 by action🐍