Closed caldwellst closed 2 years ago
Been trying to work out some strange errors I've gotten when loading in additional ARC2 data, but should be done soon!
@Tinkaa @hannahker I've worked out what the strange errors are that I was getting trying to finalise the ARC2 analysis. Took a while to sort out, but see the file here.
While looking for other ARC2 documentation, found this: ftp://ftp.cpc.ncep.noaa.gov/fews/fewsdata/africa/arc2/ARC2_missing_dates.txt
Good detective work @caldwellst ! Messy from me to miss them in the first place.
Regarding what to do with them:
1) Since no missing data has occured in the last 10 years, I wouldn't worry about it for this years monitoring.
2) Lets build in a mechanism anyway in the pipeline (and historical analysis) to check for this. Return a warning when the data is missing. And then I would leave the raw values to be -999
but for computing the dryspell use the precipitation of the previous day for the missing date. I would argue for only using the precipitation of the previous day and not the average with the next day, because this data is not available when monitoring (you can also set-up a logical for this but think we make it more complicated then than it needs to be).
@joseepoirier @hannahker What do y'all think? I think the idea about the previous day is good, although would only really be problematic if the missing data came on a potential confirmation day of a dry spell. Whatever the decision, will need to re-check our historical dry spell identification and RP assumptions again just in case any potential dry spells were missed due to the lack of data.
Just another quick note, the unavailability of data in each instance covers all 32 districts in Malawi (hasn't been just individual raster cells), so we can safely apply whatever method is decided directly on the aggregated data rather than on the raster cells themselves.
I am working on the processing scripts and will use those to re-process and get the final precipitation and dry spells data for upload to HDX.
Agreed that we should have an automated test to flag missing data. I wonder if the data may be retroactively updated in the future and/or if the providers can explain the gaps. We might want to confirm whether they do retroactive updates and/or report "in batch" when skipping days (ie if no data available on Tu and Wed, does Th reflect all rain from Tu-TH or only Th's total?)
Preference not to use ) as the default missing data indicator. If imputing, using the day before's total seems reasonable.
Good catch @caldwellst!
My interpretation of the explanatory text file is that the lack of gauge or IR data hasn't been a problem since 2013, but agree that it would be good to check for this in our pipeline just in case.
I also wonder if it would be more appropriate to fill in the missing value with an average of something like the last 3 days' worth of precipitation data? Just in case the previous day was an isolated spike in rain.
I did some very simple testing in #206 that shows that interpolation using the previous and following data points is best. I don't think this is a problem to implement even for the action trigger monitoring, as we plan to be checking this daily from the 10th day or so when we expect a dry spell to possibly occur. If the extremely unlikely event happens that data is missing on a confirmation date, I think not a problem to highlight we are waiting an additional day to confirm.
@joseepoirier From the explanation on the ARC2 page linked above, it seems that daily missing values are fully missing and not integrated into later days. I have been modifying the ARC2 processing and downloading in #205, and we will make sure that it captures data if and when it is again made available.
@caldwellst I think we can merge this?
Very quick double checking on the instances of 2 dry spells in a single season!