ESA-VirES / Swarm-VRE

DEPRECATED: Refer to Swarm-DISC/Swarm_notebooks
https://github.com/Swarm-DISC/Swarm_notebooks
MIT License
1 stars 0 forks source link

Providing access to ground observatory data #8

Open smithara opened 4 years ago

smithara commented 4 years ago

Ground observatory data should eventually be made accessible through VirES. In the meantime we can provide access to them in the shared folder with demo notebooks showing how to read them.

UPDATE: There is a notebook demoing direct download and reading from the BGS FTP so perhaps duplicating the data to the VRE is not really necessary:
https://nbviewer.jupyter.org/github/lmar76/swarmnb/blob/master/obsdata.ipynb

Re-processed observatory data are available from BGS at ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/

There are three collections, hour (from year 1900-), minute (from 1997), second (from 2012), which contain mixed records from different observatories. They are updated every 4 months(?)

The following should be duplicated to ~/shared/AUX_OBS/ and the .ZIP files extracted

Code to read the hourly files (click to expand) ```python import pandas as pd import xarray as xr def load_dataset(filename, as_pandas=False): df = pd.read_csv( filename, comment="#", names=['obs', 'gc_lat', 'long', 'rad', 'yyyy', 'mm', 'dd', 'UT', 'N', 'E', 'C'], delim_whitespace=True) # Convert to datetime index df.index = pd.to_datetime( df["yyyy"]*100000000 + df["mm"]*1000000 + df["dd"]*10000 + df["UT"].astype(int)*100 + 30, format="%Y%m%d%H%M") df = df.drop(columns=["yyyy", "mm", "dd", "UT"]) # Note that the time series is repeated over and over, for each observatory # Note also that there are jumps in each time series if as_pandas: return df # Convert to xarray # Set up empty dataset with just the times year = df.index[0].year times = pd.date_range(start=f"{year}-01-01T00:30", end=f"{year}-12-31T23:30", freq="h") ds = xr.Dataset( {"NEC": ["N", "E", "C"], "Timestamp": times}) # Loop through each sub-dataframe (containing just measurements from one observatory) # Add each as a DataArray for obsname, df_obs in df.groupby("obs"): # Infill gaps in the time series (with nans) df_obs = df_obs.reindex(times) # Add data for each observatory ds = ds.assign({ f"{obsname}": (("Timestamp", "NEC"), (df_obs[["N", "E", "C"]].values))}) # Add attributes with observatory locations ds[obsname].attrs = {"Latitude": df_obs["gc_lat"].iloc[0].round(3), "Longitude": df_obs["long"].iloc[0].round(3), "Radius": df_obs["rad"].iloc[0].round(3)} return ds ``` ```python # Example loading: load_dataset('hour/SW_OPER_AUX_OBS_2__19020101T000000_19021231T235959_0122.txt') ``` gives ``` Dimensions: (NEC: 3, Timestamp: 8760) Coordinates: * NEC (NEC)