eliocamp / metR

Tools for Easier Analysis of Meteorological Fields
https://eliocamp.github.io/metR/
139 stars 22 forks source link

error processing netcdf file #173

Closed fipoucat closed 1 year ago

fipoucat commented 1 year ago

I am using metr to process a netcdf file but facing an error message:

slp <- ReadNetCDF(file = slp_file,vars = "msl",out = "data.frame") %>%

eliocamp commented 1 year ago

What does slp_file has? The error implies that it's an empty string.

fipoucat commented 1 year ago

tgge netcdf file looks lile this: ncdump -h mslp_1991-2022.nc netcdf mslp_1991-2022 { dimensions: longitude = 281 ; latitude = 221 ; time = 47104 ; variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:00.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short msl(time, latitude, longitude) ; msl:scale_factor = 0.127916469564952 ; msl:add_offset = 100097.936041765 ; msl:_FillValue = -32767s ; msl:missing_value = -32767s ; msl:units = "Pa" ; msl:long_name = "Mean sea level pressure" ; msl:standard_name = "air_pressure_at_mean_sea_level" ;

So maybe the file not correctly read?

fipoucat commented 1 year ago

can it be the file size 5Gb?

eliocamp commented 1 year ago

No, I mean literally what the variable slp_file is storing. It should be the path to your file, but the error you are seeing suggests that it's an empty string.

fipoucat commented 1 year ago

Now, obviously it was the netcdf file. I used another method to download and can read it now. Another error occurred but looks related to maybe the size? slp_file <- nc_open("/home/sarr/work/DIN/ERA5-mslp_1991-2022.nc",write=FALSE, readunlim=TRUE, verbose=FALSE, auto_GMT=TRUE, suppress_dimvals=FALSE, return_on_error=FALSE )

slp <- ReadNetCDF(file = slp_file,vars = "msl",out = "data.frame") %>%

  • setNames(c("time","lat","lon","value")) %>%
  • select(lon,lat,time, value) %>% mutate(time = as.Date(time)) Error in (function (..., sorted = TRUE, unique = FALSE) : Cross product of elements provided to CJ() would result in 2526733664 rows which exceeds .Machine$integer.max == 2147483647
eliocamp commented 1 year ago

Yes, that might be related to the size of the file. You might be able to read it with out = "array" or using the subset argument to read only part of the file. Otherwise, you might need to use other tools that don't read the whole file at once.

fipoucat commented 1 year ago

I used daily average (cdo) and was able to read the file. I want to subset the time using only data for the months May, June,July, august, September and October. Is-it possible with metR? if so how.

Thank you

eliocamp commented 1 year ago

Yes, but you need to split your subset into continuous chunks and your subset would look like this

subset = list(time = list(c("2000-05-01", "2000-11-01"),
                          c("2001-05-01", "2001-11-01"))

... and so forth. So your time element needs to be a list and each elements of that list is a contiguous chunk of time. (This is due to the way the NetCDF file format works, it can only read contiguous chunks in any dimension, so to read many separated chunks you need one read operation per chunk, which ReadNetCDF does automatically with this syntax.)

One way of building this subset programmatically is this:

years <- 1979:2008
chunks <- lapply(years, function(y) paste0(y, c("-05-01", "-11-01")))

subset = list(time = chunks)

(of course, change years to the years you need).