There is now a probably easier way to download this kind of data using xarray
. There are examples downloading and ploting variables in the folder notebook
. There is also two new example scripts, get_gfs_xarray.py
and get_gfs_hist_xarray.py
, that download data from the real-time and historical server (see this comment for more information) using xarray. Thanks to @heyerbobby for the first version of the xarray scripts.
If you are looking to download only from the real time server, the repository https://github.com/jagoosw/getgfs contains a more polished and user-friendly version and you should probably use that instead.
These scripts were tested with Python 3.9, but they should work with any Python 3 version. First install Anaconda and then create an enviroment with
conda env create -f environment.yml
Then activate the environment
conda activate get-gfs
Scripts to fetch meteorological data from the GFS model:
get_gfs.py
gets data from the real-time server, which is located at
https://nomads.ncep.noaa.gov/dods/ and holds the last 15 days of data.get_gfs_hist.py
gets data from the historical server, which is located
at https://www.ncei.noaa.gov/thredds/catalog/model-gfs-004-files-old/catalog.html and
holds the last 2 years of data (more information: https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs)Example for the real time server:
./get_gfs.py -s 1 -r 0.25 -t 0 48 -x -10 10 -y -15 15 -p 0 2 -c example_conf.json 20210217 00
The previous line will download meteorology from the GFS run on 2021-02-17 at 00z:
example_conf.json
Example for the historical server:
./get_gfs_hist.py -t 0 10 -x -10 10 -y -10 10 -c example_conf_hist.json 20191005 00
Note that the historical server:
To build the JSON configuration files for the historical server you can go directly to the server and check the following URL for any day:
The possible values for the height_above_ground
and isobaric
levels can be
obtained running a query directly in the browser, for instance:
Similarly, for the real time server you can get this information by adding the suffix .dds
, .info
and .das
In the URLs you can also see some information about the meteorological variables such us units, minimum, maximum, representation of missing values and so on.
The output of the script is an Pandas dataframe written to an ASCII file, with a
multi-index in the rows (lat, lon) and a multi-index in the columns
(variables-time). It can be read back into Python using pd.read_csv()
.
Apart from the name of the variables, which is different in both servers (even though they refer to the same meteorological variable), there are also other differences between them:
tmp2m
, tmp80m
, tmp100m
refer to
the temperature at 2, 80 and 100m above ground. In the historical server
these variables are a stored in an new dimension of the variable, for
example Temperature_height_above_ground
. Thus, in the historical server
the z-axis (either height_above_ground
or pressure
) has to be set for
each variable in the configuration file. In the real time server the
pressure levels are controlled using an optional parameter, but they have
to be the same for every variable which has them. Variables at different
heights are different entries, as mentioned above.