MagicForrest / DGVMTools

R package for processing, analysing and visualising ouput from Dynamic Global Vegetation Models (DGVMs)
GNU General Public License v3.0
27 stars 22 forks source link

netcdf limits? #74

Open anthoni-p opened 3 years ago

anthoni-p commented 3 years ago

getting an error when trying to read a netcdf with per PFT data, not sure where it comes from.

>   S3.gcp2021.src=defineSource("S3_GCP2021","S3_GCP2021"
+                               ,"/pd/data/lpj/GCP/GCP_2021/ftpUpload/LPJ-GUESS/S3",format=NetCDF
+                               ,contact="xxx",institute="xxx")
>   cvegpft.s3.21=getField(S3.gcp2021.src,"cVegpft",file.name = "LPJ-GUESS_S3_cVegpft.nc")

Error in (function (..., sorted = TRUE, unique = FALSE)  : 
  Cross product of elements provided to CJ() would result in 2163283200 rows which exceeds 
.Machine$integer.max == 2147483647

Can only parts of such a file get read in, e.g. PFT index 1, how can we specify that in a getfield?

here ncdump info of the files, yearly data, so not really that many time steps: netcdf LPJ-GUESS_S3_cVegpft { dimensions: PFT = 26 ; longitude = 720 ; latitude = 360 ; time = UNLIMITED ; // (321 currently) variables: int PFT(PFT) ; PFT:long_name = "plant functional type" ; double longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; double latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; double time(time) ; time:units = "days since 1700-01-01" ; time:long_name = "time" ; time:calendar = "noleap" ; float cVegpft(time, latitude, longitude, PFT) ; cVegpft:units = "kg m-2" ; cVegpft:_FillValue = -99999.f ; cVegpft:long_name = "Vegtype level Carbon in Vegetation" ;

MagicForrest commented 3 years ago

Hi Peter,

It looks like TRENDY data is now just slightly over R's integer size limit. That is bad luck. Unfortunately I haven't implemented reading Fields per layer. I never quite had the reason, but now there is one.. It shouldn't be too hard but will take a bit of time.

Is this urgent for you? I am on holiday right now.

MagicForrest commented 2 years ago

Hi Peter,

I have implemented a "layer" argument to getField() which allows selection of individual variables inside the netCDF file. I think this is a useful thing. But I just realised that doesn't solve the problem you had above, because it doesn't allow selecting individual PFT indices.

So, I propose adding an additional argument, something like 'layer.dim.indices' which would allow subsetting on the PFT (or whatever) dimension. Will take a bit more time to implement however...

MagicForrest commented 3 months ago

@anthoni-p I guess this issue hasn't gone away? I never implemented the 'layer.dim.index' idea. I can see a way to solve this though, basically read the netCDF is separate slices for each element in the layer axis (in this case your 26 PFTs). That will keep the size below the maximum integers and keep the memory footprint down. It might even make the code faster, but I am not sure....

If you are still in interested, can you supply me with a test file?