ECMWFCode4Earth / vAirify

code repository for 2024 Code for Earth project #16
MIT License
1 stars 0 forks source link

Handling in situ data #21

Closed amehta-scottlogic closed 2 weeks ago

amehta-scottlogic commented 1 month ago

Background

We've decided to change the approach to in_situ data storage. Essentially we will just store the pollutant measurements we get from various data sources as they are without calculating AQI for a city. We can tag each measurement with it's corresponding location as well as the lat/long of where the station is and any other relevant data

Acceptance Criteria

Test Checklist:

mwalker-scottlogic commented 1 month ago

Test Analysis

To test:

Verify in situ documents match schema

Specified in ticket

Ideally, this would be tested in As a tester I have an automated integration test suite for the In-Situ ETL pipeline so that system quality does not regress.

Tests created here can be migrated to this ticket

Test Case required:

Test Charter:

Regression candidate:

Yes

mwalker-scottlogic commented 1 month ago

@amehta-scottlogic What is the expected schema for an in situ document? I'm just a bit confused by the comment 'Any other existing fields should be kept'

OpenAQ data for each city is stored in the database with the following fields:

  • measurement_time (when measurement was taken)
  • name (location name from locations collection)
  • pollutant values (can be o3, no2, so2, pm10 and pm2_5).
  • If these values are from the same sensor and same time they should be combined into one document.
  • They can be omitted if they are not present from a site
  • created_time (unchanged)
  • last_modified_time (unchanged)
  • Collection indexes are updated accordingly
  • We no longer store the aqi and overall aqi levels
  • Any other existing fields should be kept

Current documents appear to be in the following format:

_id: 
location_name: 
measurement_date: 
name:  
api_source: 
last_modified_time: 
location: 
    type: "Point"
    coordinates: Array (2)
        0: 
        1:  
location_type: 
metadata: 
    entity: 
    sensor_type: 
*****optional_pollutant_value:
created_time: