OCHA-DAP / pa-aa-yem-flooding

GNU General Public License v3.0
3 stars 0 forks source link

Ecmwf mars forecast zonal stats #12

Closed turnerm closed 1 year ago

turnerm commented 1 year ago

Sorry Zack just opening this PR for you because I wanted to leave a couple of comments, but please adjust the text / title / base etc!

turnerm commented 1 year ago

My main comments here are:

  1. For ECMWF ERA5 ground, it turns out that the hourly data is cumulative -- so the 00:00 value is the one we need. In the meantime I've written the extraction into Python, I will make a PR for that.
  2. Not sure if you can just use the ERA5 data that we downloaded? Would make debugging a bit easier
  3. For ECMWF HRES, the rolling sums should be across lead time, not forecast date.
zackarno commented 1 year ago

Thanks for opening this PR and the comments. Replying to them below:

My main comments here are:

  1. For ECMWF ERA5 ground, it turns out that the hourly data is cumulative -- so the 00:00 value is the one we need. In the meantime I've written the extraction into Python, I will make a PR for that.
  2. Not sure if you can just use the ERA5 data that we downloaded? Would make debugging a bit easier
  1. For ECMWF HRES, the rolling sums should be across lead time, not forecast date.

I think this is what I did, but perhaps I misunderstood something. I calculated the rolling sum for each forecastED date (not date of forecast) for each lead time. So the output has columns:

here is just a preview of the target ecmwf_mars_leads_split_rolled where this is done:

image

it looks right to me?

zackarno commented 1 year ago

Just opened this issue https://github.com/OCHA-DAP/pa-aa-yem-flooding/issues/15#issue-1682558131

turnerm commented 1 year ago

I think when I did this originally we did not have all the data or were note sure about that cumulative detail you described above so I just used daily data from GEE : ImageCollection: ECMWF/ERA5_LAND/DAILY_RAW . I can re-run on the local files, but seems like you might have done this already from looking at the PRs above. I compared the GEE extraction values to the local files 00:00:00 value and they are equivalent (https://github.com/OCHA-DAP/pa-aa-yem-flooding/commit/f5a47ea728d7e367999e11d0aa2902ac12d5bb23), but worth noting that the local files 00:00:00 value is for the previous day so you can't simply convert the date times to dates without subtracting a day (i.e "2001-09-02 00:00:00" is the precip for 2001-09-01)

Ah yes, the 00:00 values is really the sum of the previous day. However in the local files, when I read it in with xarray, there is no 00:00 but 24:00, and it's the sum of the same day, so that's what I've been going with.

I think this is what I did, but perhaps I misunderstood something. I calculated the rolling sum for each forecastED date (not date of forecast) for each lead time.

The rolling sum should always be computed for the forecasted date, but the point is that you should take a single date of forecast, and compute the rolling sum only using the data from that (without mixing in other dates of forecast). Thus for 3-day rolling sum, you would end up with 8 lead times instead of 10. Hope this makes sense

zackarno commented 1 year ago

The rolling sum should always be computed for the forecasted date, but the point is that you should take a single date of forecast, and compute the rolling sum only using the data from that (without mixing in other dates of forecast). Thus for 3-> day rolling sum, you would end up with 8 lead times instead of 10. Hope this makes sense

ah yes i think that makes sense as that is how we would roll the values for assessing whether trigger threshold is crossed.

I think I had done it per lead time across different dates forecast generated to conceptualize a different question - more for the purpose of assessing the skill/accuracy of the forecast against historical (i.e comparing ERA5 and HRES), rather than assessing how the triggers would be activated (although I see why this is important also). My thinking was that we would be able to see how accurate each lead time is and that we might see that a lead time of 1 day gives us some degree of confidence, but after running a rolling sum on the dates within the lead time of of 1 (i.e 3 day) we might see that we get a bit more confidence from the sort of smoothing effect. Could potentially come to some conclusion like we think 1 day lead times are n % accurate for any given date, but when we consider 3 (or more) at once its n+ x % accurate. think from preliminary charts we I made in the 09_ECMWF.html doc we do see this sort of affect, but then again I'm not sure if this is totally necessary.

zackarno commented 1 year ago

For ECMWF HRES, the rolling sums should be across lead time, not forecast date.

created new separate target for this 449c674