This module provides tools to access data from NOAA’s Climate Data Online Web Services v2 API provided by NOAA’s National Centers for Environmental information (formerly the National Center for Climate Data).
Documentation for this project is available on Read The Docs.
Install using pip:
pip install pyncei
Alternatively, you can use the environment.yml file included in the GitHub repository to build a conda environment and install pyncei there:
conda env create -f environment.yml
conda activate pyncei
pip install pyncei
This method includes geopandas, which is absent from the pip
installation but if installed allows the to_dataframe()
method to
return a GeoDataFrame when coordinates are provided by NCEI.
To use the NCEI web services, you’ll need a token. The token is a
32-character string provided by NCEI; users can request one
here. Pass the token to
NCEIBot
to get started:
from pyncei import NCEIBot
ncei = NCEIBot("ExampleNCEIAPIToken")
You can cache queries by using the cache_name parameter when creating an
NCEIBot
object:
ncei = NCEIBot("ExampleNCEIAPIToken", cache_name="ncei_cache")
The cache uses
CachedSession
from the requests-cache module. Caching behavior can be modified by
passing keyword arguments accepted by that class to NCEIBot
. For
example, successful requests are cached indefinitely by default if the
cache is being used. Users can change this behavior using the
expire_after keyword argument when initializing an NCEIBot
object.
NCEIBot
includes methods corresponding to each of the endpoints
described on the CDO website. Query parameters specified by CDO can be
passed as arguments:
response = ncei.get_data(
datasetid="GHCND",
stationid=["GHCND:USC00186350"],
datatypeid=["TMIN", "TMAX"],
startdate="2015-12-01",
enddate="2015-12-02",
)
Each method call may make multiple requests to the API, for example, if
more than 1,000 daily records are requested. Responses are combined in
an NCEIResponse
object, which extends the list class. Individual
responses can be accessed using list methods, for example, by iterating
through the object or accessing a single item using its index. Data
from all responses can be accessed using the values()
method, which
returns an iterator of dicts, each of which is a single result:
for val in response.values():
print(val)
The response object includes a to_csv()
method to write results to a
file:
response.to_csv("station_data.csv")
As well as a to_dataframe()
method to write results to a pandas
DataFrame (or a geopandas GeoDataFrame if that module is installed and
the results include coordinates):
df = response.to_dataframe()
The table below provides an overview of the available endpoints and their corresponding methods:
CDO Endpoint | CDO Query Parameter | NCEIBot Method | Values |
---|---|---|---|
datasets | datasetid | get_datasets() |
datasets.csv |
datacategories | datacategoryid | get_data_categories() |
datatypes.csv |
datatypes | datatypeid | get_data_types() |
datacategories.csv |
locationcategories | locationcategoryid | get_location_categories() |
locationcategories.csv |
locations | locationid | get_locations() |
locations.csv |
stations | stationid | get_stations() |
– |
data | – | get_data() |
– |
Each of the NCEIBot get methods accepts either a single positional
argument (used to return data for a single entity) or a series of
keyword arguments (used to search for and retrieve all matching
entities). Unlike CDO, which accepts only ids, NCEIBot
will try to
work with either ids or name strings. If names are provided, NCEIBot
attempts to map the name strings to valid ids using find_ids()
:
ncei.find_ids("District of Columbia", "locations")
If a unique match cannot be found, find_ids()
returns all identifiers
that contain the search term. If you have no idea what data is available
or where to look, you can search across all endpoints by omitting the
endpoint argument:
ncei.find_ids("temperature")
Or you can browse the source files in the Values column of the table
above. The data in these files shouldn’t change much, but they can be
updated using refresh_lookups()
if necessary:
ncei.refresh_lookups()
from datetime import date
from pyncei import NCEIBot, NCEIResponse
# Initialize NCEIBot object using your token string
ncei = NCEIBot("ExampleNCEIAPIToken", cache_name="ncei")
# Set the date range
mindate = date(2016, 1, 1) # either yyyy-mm-dd or a datetime object
maxdate = date(2019, 12, 31)
# Get all DC stations operating between mindate and maxdate
stations = ncei.get_stations(
datasetid="GHCND",
datatypeid=["TMIN", "TMAX"],
locationid="FIPS:11",
startdate=mindate,
enddate=maxdate,
)
# Select the station with the best data coverage
station = sorted(stations.values(), key=lambda s: -int(s["datacoverage"]))[0]
# Get temperature data for the given dates. Note that for the
# data endpoint, you can't request more than one year's worth of daily
# data at a time.
year = maxdate.year
response = NCEIResponse()
while year >= mindate.year:
response.extend(
ncei.get_data(
datasetid=datasetid,
stationid=station["id"],
datatypeid=datatypeids,
startdate=date(year, 1, 1),
enddate=date(year, 12, 31),
)
)
year -= 1
# Save values to CSV using the to_csv method
response.to_csv(station["id"].replace(":", "") + ".csv")
# Alternatively, merge observation and station data together in a pandas
# DataFrame. If geopandas is installed and coordinates are given, this
# method will return a GeoDataFrame instead.
df_stations = stations.to_dataframe()
df_response = response.to_dataframe()
df_merged = df_stations.merge(df_response, left_on="id", right_on="station")