AQ-AI / openaq-engine

http://www.aqai.xyz
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Integrate you publish the Airflow structure in https://github.com/Ali-Maq/aqi to https://github.com/AQ-AI/openaq-engine #60

Closed ChristinaLast closed 2 weeks ago

ChristinaLast commented 11 months ago

The structure of the airflow is quite different from the openaq-pipeline From what I can see the following:

 processing

  1. buffer_processor.py
    • this doesnt need to exist if we are handling through earth engine (the resolution of the image is the buffer by default)
  2. feature_extractor.py
    • I am assuming this is where we do the satellites
  3. feature_merger.py
    • is this where we combine the field data from the satellites (e.g. with regardstot he lookback used)?
  4. vrt_creator.py
    • We are not using vrt at the moment. But this is an option for when we only query from our local data store.

authentication

  1. openaq_auth.py
    • when in airflow will these happen?
  2. earth_engine_auth.py
    • this authentication might need to happen for every earth engine query. so the two auth processes my be executed at different times in the pipeline.

extraction

  1. gadm_extractor.py
    • At the moment we arent using city-masks to extract features, although this is a possibility (combining with local file storage and vrt)
  2. geo_location_generator.py
    • not sure what this does? is this generation for the prediction locations? In the first version the random points we used just for model testing, in the deployed models, we will need to sample from a grid of the agreed-upon spatial resolution.
  3. openaq_extractor.py
    • is this time splitter +cohort filter + cohort builder?
Ali-Maq commented 11 months ago

TASK for today September 19 2023

Processing :

  1. Buffer processing - If this is happening on the google earth engine, Then I am removing this
  2. Yes this is where we do the the ee_data.py thing to get the satellite data
  3. regards to???? I don't get this, Yes this is where we get the satellite data and I am using the image collection method both in parallel and series in th existing pipeline
  4. I will remove this and change the workflow.

*- Working on Google credentials generation and will share it with you @ChristinaLast , Should I just push the changes in the config file?