Open skfrost01 opened 8 months ago
Can you please provide an example of code that produces this error?
Thank you for the quick response! year = 2023 response = NCEIResponse() while year >= 2000: response.extend( ncei.get_data( datasetid="GHCND", stationid='GHCND:USC00140637', datatypeid=["PRCP"], startdate=date(year, 1, 1), enddate=date(year, 12, 31) ) )
year -= 1
df_precip_temp = response.to_dataframe()
This is an example of one of the stations that throws this error for me. There are lots, such as GHCND:USC00347390
that work as expected.
This gives a similar error, but with a different: KeyError: 'elevation
x = ["FIPS:56"] stations = ncei.get_stations( datasetid="GHCND", datatypeid=["PRCP"], locationid=x, startdate=mindate, enddate=maxdate, ) df_stations = stations.to_dataframe()
Both these errors result from bugs in how this library handles missing data. In the first example, it looks like there is a gap in the data for that station between 1951 and 2003; the missing years are causing the errors. In the second, certain stations are missing the elevation parameter. I've patched the issue on GitHub but expect it will be a bit before I do a new release. In the meantime, you can install the GitHub code as follows:
git clone https://github.com/adamancer/pyncei
cd pyncei
pip install .
And here is a version of your code that should catch the missing years. It does require you to use the development code.
response = NCEIResponse()
for year in range(2000, 2024):
resp = ncei.get_data(
datasetid="GHCND",
stationid='GHCND:USC00140637',
datatypeid=["PRCP"],
startdate=date(year, 1, 1),
enddate=date(year, 12, 31)
)
if resp:
response.extend(resp)
else:
print(f"No data found for {year}")
response.to_dataframe()
Let me know if that solves the problem for you.
Thanks for digging into this! It works now for USC00140637
and generally seems to be getting a higher success rate, but there are still some stations that are failing. For example:
USC00031459
USC00145870
USC00340017
I have this set up to pull stations within a radius of a point, so these are just a random selection of ones that failed.
Can you please provide the code that is producing the error? When I plug those stations into the code above, it seems to run fine.
Here is my uncommented, data science-esque code in all of its inefficient glory... Maybe I did something wrong and am still using the original pyncei code?
lat = 35.00
lon = -97.05
distance = 125 #km
df_stations = pd.read_csv('stations.csv')
gdf_stations = gpd.GeoDataFrame(df_stations,
geometry=gpd.points_from_xy(df_stations['longitude'], df_stations['latitude']),
crs='EPSG:4326')
gdf_stations_proj = gdf_stations.to_crs('EPSG:3395')
site = gpd.GeoSeries([Point(lon, lat)], crs='EPSG:4326').to_crs('EPSG:3395')
gdf_stations_proj['distance'] = gdf_stations_proj.distance(site[0])
gdf_ref = gdf_stations_proj[gdf_stations_proj['distance'] <= distance * 1000] # Filter for distances within set distance
df_precip = pd.DataFrame()
for id in gdf_ref["id"].unique():
year = 2023
ncei = NCEIBot("********************************", cache_name="ncei")
response = NCEIResponse()
for year in range(2000, 2024):
resp = ncei.get_data(
datasetid="GHCND",
stationid=id,
datatypeid=["PRCP"],
startdate=date(year, 1, 1),
enddate=date(year, 12, 31)
)
if resp:
response.extend(resp)
else:
print(f"No data found for {year}")
df_precip_temp = response.to_dataframe()
df_precip = pd.concat([df_precip, df_precip_temp])
I am also attaching a copy of stations.csv
which is a bulk pull using ncei.get_stations
stations.csv
Hmm I can't reproduce the error without falling back to the release version on PyPI. I'm a little mystified by the error popping up for these stations but not for the station we discussed earlier. Is the traceback the same?
Can you run pip freeze
in your command line and locate pyncei in the output? If you've installed it from PyPI, it should show up as pyncei==1.0
, otherwise there should be a path to a file on your computer.
And a friendly word of warning--you don't want to share an API token publicly. I tried to obscure it above but it's still in the comment history. Be careful pasting code in a public forum.
Yep, you're right, I didnt install the updated version correctly the first time (still not sure how that station ran, I checked it like 3 times). Anyways, appreciate the help!
Hello, I'm having a similar issue. I checked the version of the package and the version is 1.0. I also checked if NOAA has a request response, it seems that the server is providing data, but the package can't convert it into a data frame.
I would appreciate it if you could help me with this.
I fixed the issue. When the response is 1, it is actually missing. As a result, they make mistakes when stitching data from different years. I modified the code to fix the occurrence of this exception.
There is a good chance this is a user error, but I am running into the following error, specifically when pulling
GHCND
andPRCP
data. If I follow the example and generate a response, there appears to be data, but usingto_dataframe()
throws the following error for some stations:`File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in
response.to_dataframe()
File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pyncei/bot.py", line 1068, in to_dataframe
df = pd.DataFrame(self.values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 832, in init
data = list(data)
^^^^^^^^^^
File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pyncei/bot.py", line 1010, in values
yield {k: val[k] for k in self.key_order if k in keys}