Providing access to ground observatory data

Ground observatory data should eventually be made accessible through VirES. In the meantime we can provide access to them in the shared folder with demo notebooks showing how to read them.

UPDATE: There is a notebook demoing direct download and reading from the BGS FTP so perhaps duplicating the data to the VRE is not really necessary:
https://nbviewer.jupyter.org/github/lmar76/swarmnb/blob/master/obsdata.ipynb

Re-processed observatory data are available from BGS at ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/

There are three collections, hour (from year 1900-), minute (from 1997), second (from 2012), which contain mixed records from different observatories. They are updated every 4 months(?)

The following should be duplicated to ~/shared/AUX_OBS/ and the .ZIP files extracted

ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/hour/ (hourly means)
- 520MB compressed, 4.4GB extracted
- Contains zipped text files, one for each year, between 1900 to "now". The latest file (SW_OPER_AUX_OBS_2__20200101T000000_20201231T235959_0122.txt) will be updated over time. Each year has a data file and a metadata file:
```
SW_OPER_AUX_OBS_2__19000101T000000_19001231T235959_0122.txt  (inside .ZIP)
AUX_OBS_2_1900.input  (outside .ZIP)
```
ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/minute/
- 15GB compressed
- Contains zipped CDF files (extension .DBL) one for each day from 1997 onwards e.g.:
```
SW_OPER_AUX_OBSM2__20091231T000000_20091231T235959_0101.DBL
SW_OPER_AUX_OBSM2__20091231T000000_20091231T235959_0101.HDR
```
ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/second/
- 30GB compressed

Code to read the hourly files (click to expand)

```python import pandas as pd import xarray as xr def load_dataset(filename, as_pandas=False): df = pd.read_csv( filename, comment="#", names=['obs', 'gc_lat', 'long', 'rad', 'yyyy', 'mm', 'dd', 'UT', 'N', 'E', 'C'], delim_whitespace=True) # Convert to datetime index df.index = pd.to_datetime( df["yyyy"]*100000000 + df["mm"]*1000000 + df["dd"]*10000 + df["UT"].astype(int)*100 + 30, format="%Y%m%d%H%M") df = df.drop(columns=["yyyy", "mm", "dd", "UT"]) # Note that the time series is repeated over and over, for each observatory # Note also that there are jumps in each time series if as_pandas: return df # Convert to xarray # Set up empty dataset with just the times year = df.index[0].year times = pd.date_range(start=f"{year}-01-01T00:30", end=f"{year}-12-31T23:30", freq="h") ds = xr.Dataset( {"NEC": ["N", "E", "C"], "Timestamp": times}) # Loop through each sub-dataframe (containing just measurements from one observatory) # Add each as a DataArray for obsname, df_obs in df.groupby("obs"): # Infill gaps in the time series (with nans) df_obs = df_obs.reindex(times) # Add data for each observatory ds = ds.assign({ f"{obsname}": (("Timestamp", "NEC"), (df_obs[["N", "E", "C"]].values))}) # Add attributes with observatory locations ds[obsname].attrs = {"Latitude": df_obs["gc_lat"].iloc[0].round(3), "Longitude": df_obs["long"].iloc[0].round(3), "Radius": df_obs["rad"].iloc[0].round(3)} return ds ``` ```python # Example loading: load_dataset('hour/SW_OPER_AUX_OBS_2__19020101T000000_19021231T235959_0122.txt') ``` gives ``` Dimensions: (NEC: 3, Timestamp: 8760) Coordinates: * NEC (NEC)

ESA-VirES / Swarm-VRE

Providing access to ground observatory data #8