Closed HelixPiano closed 1 year ago
Hello @HelixPiano,
Thanks for the report. I could reproduce the high memory usage of the df.max()
function using a large GRIB file that I have here. However, I then converted the GRIB file into NetCDF format and tried the same thing with this NetCDF file, which just uses plain xarray and not cfgrib, and the memory profile was similar (in fact the NetCDF version used more memory than the GRIB version).
I used ecCodes to perform the conversion:
grib_to_netcdf global_wind_2020_12.grib -o global_wind_2020_12.nc
So from this I'd have to conclude that cfgrib is not the culprit here, but xarray itself might be loading all the values arrays into memory at once in order to compute the maximum. Are you able to confirm this? If so, we should close this issue, and maybe you can raise one in xarray itself.
Cheers, Iain
Thanks for reporting. I'm closing this now, and we can re-open it, or open a new one if we have a case where we can confirm that NetCDF does not show the same issue.
I think this problem is the same with #70 . I also had this problem when t>8.00, the file offset becomes -5.
I encountered the same issue when indexing t > 1100; the offset
in FileStreamItems
becomes -5.
After some troubleshooting, I think that the size of the long*
type on Windows might be the root cause. When reading large GRIB files for the first time, a 4-byte long*
pointer value_p
is created and assigned a value of -5 after exceeding 2**31 in gribapi.grib_get_long(msgid, key)
. This pointer value becomes the offset in the large GRIB file and retruns to messages.Message.message_get(self, item, key_type=None, default=_MARKER)
. The offsets continue to return -5 in messages.FileStreamItems.__iter__()
and are stored in indexing files (whether in files or RAM). When actually reading the GRIB file to get values, an OSError: [Errno 22] Invalid argument
is raised.
However, -5 is returned by lib.grib_get_long()
in gribapi.grib_get_long(msgid, key)
. I don't have the capability to troubleshoot this issue further. A potential solution might involve using an explicit long long*
statement or other methods to upgrade to a 64-bit integer pointer.
Currently, using a smaller GRIB file or switching to Linux can resolve the problem.
What happened?
Hello everyone, I am not sure if this a bug in xarray or a bug with cfgrib, I will therefore crosspost it.
I have a grb file with the dimension 30316x160x392 , filetype float32 and filesize of around 3.7GB.
df= xr.open_dataset("129.grb", engine="cfgrib")
works initially. The problem is when I call df.max() it maxes out the RAM of my PC and fails to return any result. RAM usage before df.max() call: 3.5/16GBIf I run
df= xr.load_dataset("129.grb", engine="cfgrib")
instead I get an error message:What are the steps to reproduce the bug?
-
Version
0.9.10.3
Platform (OS and architecture)
Windows 10 Pro
Relevant log output
Accompanying data
No response
Organisation
No response