Pirate-Weather / pirateweather

Code and documentation for the Pirate Weather API
Apache License 2.0
670 stars 30 forks source link

TimeMachine data only available before May 2023 #130

Closed NJAldwin closed 1 month ago

NJAldwin commented 10 months ago

Describe the bug

The documentation says that the time machine cannot be used for historic data within the last 5 days, and to use the forecast API instead.

Because "the last five days" isn't exactly defined, I want to make sure I can build in error handling.

If I try to hit the forecast API with a date that's too far in the past, it gives me a 400 Bad Request with the body "Requested time too early, please use https://timemachine.pirateweather.net". Great!

However, if I try to hit the time machine API with a date that's too recent, it gives me a 502 with the body

{
    "message": "Bad Gateway",
    "error": "could not JSON decode Lambda function response: statusCode validation failed"
}

Expected behavior

I would expect hitting the time machine API with a date that's too recent would return a 400 Bad Request, ideally with the latest timestamp that would be available on the time machine, but just a 400 would be fine too.

Actual behavior

Instead, I see an opaque 502 that suggests that an upstream service is returning an error.

API Endpoint

TimeMachine

Location

42.794697,-71.357407

Other details

Sample requests (requests made today, 24 December, around noon (1700UTC)

Also, would it be possible to clarify when exactly the cutoff for "last 5 days" is? Is it based on UTC midnight? Or the location midnight? Or exactly (current date - 5*24h)?

Troubleshooting steps

timeisapear commented 10 months ago

I also see issues when trying to bridge the Forecast API and Time Machine API over the past week or so. I am not able to successfully query data from either API for a day that occurred T-4 or T-5 days ago.

cloneofghosts commented 10 months ago

Thank you both for the reports. I'll ping @alexander0042 to take a look at updating the error message wording and to clarify the number of past days stored. I remember seeing somewhere that stated it was five which is what was used in the docs but it looks like it only stores the past three days. I did a quick documentation change to update the number of days stored to three which matches with the current behaviour.

alexander0042 commented 10 months ago

Hi, thanks for opening this issue and using this API- sometimes the implementation of these features changes between when I start building something to when I finish, which leads to these sports of hiccups. I started out at 5 days, but reduced it down in order to optimize the storage speeds somewhat.

Also, great "documentation" tag!

alexander0042 commented 9 months ago

Ok, so this ended up turning into a different issue, but I've got it fixed now!

AWS removed the underlying ERA-5 dataset last week, which took down timemachine; however, since this is all open data, I was able to flip it over to use Google's dataset instead, and we're back online now!

Also updated the docs to clarify the time period when the api end point should be used instead of time machine

juste97 commented 8 months ago

@alexander0042 I think something might not be right again regarding this.

With /52.520008,13.404954,1707951600 and api.pirateweather.net I get "Requested time too early, please use https://timemachine.pirateweather.net" and with timemachine I get "Latest date available is May 2023".

Or could it be that there really is no data in between? Data before May 2023 was correctly returned. 

cloneofghosts commented 8 months ago

There really is no data between May 2023 and now. AWS took down the ERA5 dataset and it was replaced with Googles dataset which only offers data until May 2023. @alexander0042 said he has met with AWS to see if they can bring back the ERA5 dataset so we'll see if they do.

juste97 commented 8 months ago

Thanks for the fast feedback! Fingers crossed that they can bring it back 😃

cloneofghosts commented 4 months ago

I noticed this issue will get marked as stale soon so I checked in on the state of things. The AWS dataset is still not being maintained and Google's dataset is still only available until May 2023.

@alexander0042 Have you thought about accessing the data directly from ECMWF? I'm not sure how complicated it would be but would fix this issue.

cloneofghosts commented 2 months ago

I see my comment from a few months ago was missed. @alexander0042 would the link I provided in my previous comment fix the issue?

M4lmostoso commented 2 months ago

It seems the problem is back, I'm not able to see yesterday weather on forecast and time machine is still locked on May 2023!

cloneofghosts commented 2 months ago

Hi @M4lmostoso the API endpoint only stores the last 32h of data and because the AWS dataset is no longer available there isn't any data between then. I did find another source of data but I'm not sure if @alexander0042 saw my comment or not.

M4lmostoso commented 2 months ago

Yes indeed, I just checked and in fact I can just see last 32h! Ouch!

cloneofghosts commented 2 months ago

You used to be able to query data for the last three days and I'm not sure when exactly that changed. I created #316 to track the issue with the API endpoint only showing the last 32h which is hopefully just an oversight and can be fixed soon.

alexander0042 commented 2 months ago

Quick update on the timemachine limitations- I've come up with a workaround on this (using the NCAR ERA5 data source), and implementing it now, since I've had a few people e-mail me about it. Couple other good things about this:

cloneofghosts commented 2 months ago

Good to know this should hopefully be fixed soon. I've tagged this one as documentation as the docs (I updated this today due to the recent discovery and should put how often it updates) and the TimeMachine changelog will need to be updated.

cloneofghosts commented 2 months ago

@alexander0042 Not sure if this is helpful or not but the NCAR ERA5 dataset is available on AWS https://registry.opendata.aws/nsf-ncar-era5/

alexander0042 commented 1 month ago

Thanks for pointing that out- it's exactly the source I needed! The historic data ingest processing is chugging along, and should be ready by tomorrow as version 2.3! Building off my general "apply open technologies whenever I can" approach, I'm using Kerchunk to do this, which is a wonderfully flexible piece of software.

Thanks for starting that docs update, I'll work on adding some key time machine details. Crucially, there's now three different ways a request could be handled:

  1. Pre May 1, 2024: ERA5 data via the NCAR S3 archive.
    • 24 hours
    • Subset of variables
    • Slowish (~10 seconds)
  2. May 1, 2024, to T-minus 24 hours: GFS/HRRR/NBM 1-hour forecast data from the PW archive
    • Provides more data and resolution than is available on ERA5
    • Full range of PW forecast variables
    • Avoids the ERA5 production time lag
    • Slow (~30 seconds), since it needs to open and read many zarr files on S3
  3. T-minus 24 hours onward: merged 1-hour forecast data with foreward looking forecast data, responding with the full 7 day forecast.
    • Same process as before!
    • Very fast (10 ms), since this is optimized for fast reads in one location
alexander0042 commented 1 month ago

One other thing for me to document- this update will technically depreciate the time machine endpoint (which I never liked having anyway). Since the primary script now has a cohesive plan for responding to queries, the main api endpoint can handle everything! It's not going anywhere, but just won't be necessary anymore

cloneofghosts commented 1 month ago

While the docs haven't been updated yet I just want to pop in and say V2.3 is available which fixes the issue of the Time Machine being limited to May 2023 and before.