Closed ACManke closed 2 years ago
The code for splitting calculations into chunks computes the total size of the grid that the calculation would need to load into memory, and decides whether to split the calculation and how that will be done. The enhancement in V7.2 to split calculations when doing transformations should have taken into account that the large file sizes allowed by modern NetCDF libraries may require grid indexing that exceeds the limits of 4-byte integers. It should use 8-byte integers when finding and making use of the grid size, but it uses 4-byte integers instead.
Patrick's examples use a file that's 3.76 GB, with grid dimensions 720x360x15x2052; and another that is just under 7 GB, with dimensions 720x360x15x3840. The transformation on the larger grid seems to succeed, but in fact for both of these grids, the size of the grid overflows the 4-byte integer that is used to store the information; for the larger file the overflow just happens to occur in a way that does not trigger an error condition. I think that the calculation for the larger file probably stops prematurely. It would take hours to complete and I have not done so.
The PyFerret code overall computes the total size of the grid of an expression in relatively few places, and I will volunteer to work through and fix this. This is straightforward.
A second related issue is that potentially a single dimension could be so long that it exceeds the capacity of 8-byte integers. To change the PyFerret code to handle this is a much larger task. It does NOT make sense to make a wholesale change of all of the 4-byte integers in the code to 8-byte integers. I propose to trap any definitions of axes that are too long for 4-byte integer indexing, and issue an error message. This would occur in a user-definition of a coordinate axis; or upon opening a file and analyzing the dimensions; or in defining an aggregation. For context, an axis some 68 years long with one-second timesteps is too long for 4-byte indexing.
This fix is in the code base: Now the the total size of the grid is computed correctly and so the computations are made by appropriately breaking them up.
Patrick Brockmann reports this here,
https://www.pmel.noaa.gov/maillists/tmap/ferret_users/fu_2019/msg01120.html
When working with the larger of the two files, the calculation is correctly split up along the time axis, but for the smaller file, the error message is consistently returned; it seems that the split/gather process is never started. I tried setting the maximum memory size to a smaller value to force splitting, but with no luck.
These are files of size 3.7 and almost 7 Gbytes. Maybe this is a problem with the data types of the integers used in computing grid sizes?
@PBrockmann