Closed Aariq closed 3 years ago
Sometimes a value preceded by days of NA has "nao foi feita observ" in the notes, which I assume indicates that this value is not accumulated.
I think both of these are good examples of the shortcomings in (especially) early data collection.
There might not be any water in the rain gauge after several days because it didn't rain or because it did but so much time went by before the next team arrived that it evaporated. I only found "nao foi feita observacao (observation was not made) once in my quick scan, but I noted there was a zero that day. I take the zero at face value. Either someone recorded zero in the morning when they woke up, or someone forgot to recorded it at a later time (say the next day) and put a zero because they knew there was no rain and noted someone had failed record the observation (makes no sense, and extremely unlikely but I suppose it's possible).
We're really preparing two things: raw precip data and processed precip data. I think the raw data for mm of rain is entered as is, and we take them at face value - a recorded zero means no rain, as opposed to a NA. Accumulated rain is an estimate whose accuracy will decrease as the number of days from prior measurement increases. For processed data the problem to solve is what the best procedure for estimating and correcting this is. Perhaps would be worth testing for correlations between precipitation at BDFFP and Manaus field stations (rain yes / no and amount). This might be a good way to help calibrate the imputing of rain on those NA days.
OK, for the raw precip data I'll just consolidate the comments and leave in all the NAs and whoever is using the data (like us) can figure out how to deal with accumulated data in a different step.
Kinds of gaps in the data:
Some stats on 2 and 3: | gap type | mean length (days) | median length (days) | max length (days) |
---|---|---|---|---|
accumulated | 7.876882 | 5 | 326 | |
untagged | 4.197796 | 2 | 1706 |
Tagged accumulations are usually relatively short, but sometimes quite long (30 "accumulated" observations with preceding gaps longer than a month)
On average, untagged gaps are shorter, but occasionally very long.
Untagged observations after gaps have distribution of precipitation similar to that of ordinary data (not after an NA).
This is cool. I wonder - does it change by year, ie maybe in the early years people were less rigorous about noting than in later years?
Also, I does it vary by camp?
The ones with the most frequent use, at least in recent decades, were: Km 41, Colosso, Porto Alegre, and Dimona. The other camps were often used a lot in the early years, but maybe fell out of favor except in cases where a team needed to be there to avoid long hikes every day or at certaion hours. For instance, researchers working on bats (scan Gaviao for enrico bernard) or the tree census team that would be at a camp for multi-month stretches.
Forgot to mention - at least when I was in the field nonstop we made a point of recording a zero on days that it didn't rain - we didn't leave things blank. This is the type of question we can post to a list of people if need be.
After fixing the problem with zeroes getting removed (#11), here's updated stats and plots:
gap type | mean length (days) | median length (days) | max length (days) |
---|---|---|---|
accumulated | 7.36 | 4 | 326 |
untagged | 6.95 | 3 | 1705 |
Sometimes precip values preceded by
NA
s aren't noted as accumulated, but seem like they probably are. But not all precip values preceded byNA
s seem like accumulated values. Sometimes a site isn't checked for months and the next value probably isn't accumulated. (e.g. COLOSSO on 1990-12-01). I'm not sure if there is a good programmatic way to note accumulated values or if we just need to "guess" for any measurements preceded byNA
s.