Closed cahartin closed 9 years ago
fx this file when get a chance. Thanks!
Note to myself...this is a weird file. For a few cells there are multiple data for any given lon-lat-z-time? Is this for real, or a bug in loadCMIP5
?
d <- loadCMIP5("ph", "IPSL-CM5A-MR", "historical", yearRange=c(1850,1850))
> d$val %>% group_by(lon, lat, Z, time) %>% summarise(n()) %>% summary()
lon lat Z time n()
Min. : 0.0004 Min. :-78.1906 Mode:logical Min. :1850 Min. : 1.000
1st Qu.: 82.0000 1st Qu.:-50.0592 NA's:318420 1st Qu.:1850 1st Qu.: 1.000
Median :176.9401 Median : 0.0000 Median :1850 Median : 1.000
Mean :176.2930 Mean : -0.7417 Mean :1850 Mean : 1.022
3rd Qu.:265.3128 3rd Qu.: 47.5114 3rd Qu.:1851 3rd Qu.: 1.000
Max. :359.9940 Max. : 89.6139 Max. :1851 Max. :21.000
subset(d$val, round(lon, 5)==16.66903 & round(lat, 5)==89.14749 & round(time, 3)==1850.042)
Source: local data frame [2 x 5]
lon lat Z time value
1 16.66903 89.14749 NA 1850.042 8.194775
2 16.66903 89.14749 NA 1850.042 8.194775
Fixed and committed one bug in both makeAnnualStat
and makeGlobalStat
(see commit e1b51bd).
The deeper problem, it turns out, is that in this file there are multiple data at the same lon/lat values, basically where their funky grid converges. So I guess we need to make sure the time-averaging functions correctly handle this...
Thank you!
Here's a question: should the multiple data at a single spatial point count as only one datum when averaging temporally or spatially? Or should they count individually? I think the former--it doesn't make any sense for that particular point in the plot above to have 21x the influence of every other point, right?
I would say only count as 1 datum. I can't imagine why one point would need to be 21X stronger.
OK, I am going to commit these changes (27954c8ec) so that @cahartin can proceed with her data processing, and close this issue, as the code now handles IPSL-CM5A-MR. Issues that need to be resolved:
dplyr
implementation averages these as a single datum, but I don't believe array
will. See new issue #141. dplyr
implementation is moving, not sure why, and will work on this! See new issue #140.