Open franTarkenton opened 1 year ago
Script is mostly complete. Hourly XML files for stations in station list are being downloaded, processed and saved to a dataframe, which is then saved as a parquet file in object store. Daily temperature and precipitation are generated and saved to object store. 'air_temp' variable used instead of 'avg_air_temp_pst1hr' as the latter was missing for many stations.
To do:
Create a script that will pull the following information on an hourly basis.
Source of data: https://hpfx.collab.science.gc.ca/20231101/WXO-DD/observations/swob-ml/20231101/
Data Aquisition
Processing
Listed in the climate_obs spreadsheet to get the station list
pull down the station data for the current hour (note hours in the file names use UTC)
Extract from the individual xml files the following properties:
If a new day is detected then create a new file, otherwise pull the existing file from object store update it and repush (make sure we are not creating new versions)
create 2 different input files one for temperature and another for precip.
format of the files / columns:
Script would run hourly when the data is available
Would pull the data down and update it, and then repost. (make sure we are not creating a new version in object storage when file is updated)
Need to setup a sync process that will ensure the data that exists in object store also exists on prem server.
Secondary: