Doing makeAnnualStat on IPSL data produces error

cahartin commented 9 years ago

Thu Jul 09 12:05:12 2015 Attempting load of ph IPSL-CM5A-MR historical Omon F:/CMIP5_HISTORICAL 1850 
> test <- makeAnnualStat(d, verbose=TRUE)
[1] "Filtering based on number in annual aggregation: "
Source: local data frame [6 x 2]

  year counts
1 1850     12
2 1850     24
3 1850     36
4 1850     48
5 1850     72
6 1850    252
[1] "number required: "
[1] 1850
Source: local data frame [1 x 5]

  lon lat  Z value time
1 260  70 NA    NA 1850
Replacing missing lon/lat combinations
 Show Traceback

 Rerun with Debug
 Error: cannot join on columns 'lon' x 'lon': index out of bounds

Timing stopped at: 28.66 4.93 33.4

bpbond commented 9 years ago

fx this file when get a chance. Thanks!

bpbond commented 9 years ago

Note to myself...this is a weird file. For a few cells there are multiple data for any given lon-lat-z-time? Is this for real, or a bug in loadCMIP5?

d <- loadCMIP5("ph", "IPSL-CM5A-MR", "historical", yearRange=c(1850,1850))
> d$val %>% group_by(lon, lat, Z, time) %>% summarise(n()) %>% summary()
      lon                lat              Z                time           n()        
 Min.   :  0.0004   Min.   :-78.1906   Mode:logical   Min.   :1850   Min.   : 1.000  
 1st Qu.: 82.0000   1st Qu.:-50.0592   NA's:318420    1st Qu.:1850   1st Qu.: 1.000  
 Median :176.9401   Median :  0.0000                  Median :1850   Median : 1.000  
 Mean   :176.2930   Mean   : -0.7417                  Mean   :1850   Mean   : 1.022  
 3rd Qu.:265.3128   3rd Qu.: 47.5114                  3rd Qu.:1851   3rd Qu.: 1.000  
 Max.   :359.9940   Max.   : 89.6139                  Max.   :1851   Max.   :21.000  

subset(d$val, round(lon, 5)==16.66903 & round(lat, 5)==89.14749 & round(time, 3)==1850.042)
Source: local data frame [2 x 5]

       lon      lat  Z     time    value
1 16.66903 89.14749 NA 1850.042 8.194775
2 16.66903 89.14749 NA 1850.042 8.194775

bpbond commented 9 years ago

Fixed and committed one bug in both makeAnnualStat and makeGlobalStat (see commit e1b51bd).

The deeper problem, it turns out, is that in this file there are multiple data at the same lon/lat values, basically where their funky grid converges. ipsl_weirdness So I guess we need to make sure the time-averaging functions correctly handle this...

cahartin commented 9 years ago

Thank you!

bpbond commented 9 years ago

Here's a question: should the multiple data at a single spatial point count as only one datum when averaging temporally or spatially? Or should they count individually? I think the former--it doesn't make any sense for that particular point in the plot above to have 21x the influence of every other point, right?

cahartin commented 9 years ago

I would say only count as 1 datum. I can't imagine why one point would need to be 21X stronger.

bpbond commented 9 years ago

OK, I am going to commit these changes (27954c8ec) so that @cahartin can proceed with her data processing, and close this issue, as the code now handles IPSL-CM5A-MR. Issues that need to be resolved:

The comment above (how do we handle multiple data at a single grid point?). For now, the default dplyr implementation averages these as a single datum, but I don't believe array will. See new issue #141.
I was shocked by how slowly the dplyr implementation is moving, not sure why, and will work on this! See new issue #140.
Package version number bumped to 1.1.9030 in new commit .

JGCRI / RCMIP5

Doing makeAnnualStat on IPSL data produces error #138