Closed PaulWessel closed 3 years ago
Hmm, gmt_gdalread
has this case that is used when asking just the file info and not the data. Why is it reading the entire file?
if (metadata_only) {
plhs[0] = populate_metadata_struct (gdal_filename, correct_bounds, pixel_reg, got_R,
nXSize, nYSize, dfULX, dfULY, dfLRX, dfLRY, z_min, z_max, get_drivers);
return;
}
Perhaps I should check the case first before I make assumtions. Maybe something else took longer time than I expected.
OK, so stepped through. Getting wesn as you said quickly, but then...
/* Fill in the rest of the GMT header values (If ...) */
if (raster_count > 0) {
if (z_min == 1e50) { /* We don't know yet the dataset Min/Max */
GDALComputeRasterMinMax(hBand, false, adfMinMax); /* Unfortunately, cannot trust metadata min/max */
The GDALComputeRasterMinMax takes several seconds for the 30 arc grid so presumably it is looping over all to set min/max. Of course, we are not always that interested in the zmin/zmax compared to needing to know wesn and inc.
It seems we have a deadlock. Not long ago we made a change to not use the quick get min/max because "cannot trust metadata min/max" and now it spends too long computing it.
Yes, true. This command takes 9 seconds on my Mac:
gmt grdcut @earth_day -Rg -JG15/20/2500/0/0/0/30/30/15c -D
which is a pretty long wait for something you expect to be instant since the values of the grids do not enter. So perhaps we need a finer control on when we do or do not read those values. We could pass that via a mode to GMT_Read_Data, e.g..,
if ((G = GMT_Read_Data (API, GMT_IS_GRID, GMT_IS_FILE, GMT_IS_SURFACE, GMT_CONTAINER_AND_DATA|GMT_NO_Z_SEARCH, NULL, Ctrl->In.file, NULL)) == NULL) {
where GMT_NO_Z_SEARCH is a (new) value in gmt_resources.h that can be consulted and fed to _gmtgdalread so it skips that loop that is usually done?
Yes, but perhaps we should go a bit further. There is in fact no need to inquire min/max for images. We have almost no use for it. The exception that occurs to me is if we want create a cpt
for an indexed image, but even here I don't recall if we have a solution to add/replace cpts to indexed images.
Even for floats, the default in the quick method to estimate the min/max is probably the most common situation, but here we would need a mechanism to request the scanning to not fall again in the case that forced the use of the exact search.
So I tend to agree with this. Perhaps we should just first no do this for images that have more than 1 band since surely for those zmin/max is meaningless. That would "fix" the current problem.
What do you think of this minor change (checking if raster_count is 1):
/* ------------------------------------------------------------------------- */
/* Fill in the rest of the GMT header values (If ...) */
if (raster_count > 0 && n) {
if (z_min == 1e50 && raster_count == 1) { /* We don't know yet the dataset Min/Max */
Also, may I replace the ±1e50 with ±DBL_MAX?
But that would apply equally to grids and images
Having a ±DBL_MAX in zmin/max is even uglier then ±1e50, but none of the is very good. Max datatype would be better but too much work for it.
But that would apply equally to grids and images
Grids have one band so we will go in and check zmin/zmax if we get here. For multiband images (rgb) we will not. Isn't that the most basic check we can do?
Having a ±DBL_MAX in zmin/max is even uglier then ±1e50, but none of the is very good. Max datatype would be better but too much work for it.
Sure, but these are double and the test is just used to see if those values have changed:
double z_min = 1e50, z_max = -1e50;
Of course DBL_MAX and 1e50 does not change anything but note that DBL_MAX is used in this way throughout GMT so it is odd to switch to another constant just in one gmt*.c file.
Grids have one band
I'm not sure of this at all. For example, I'm working in the satellite RemoteS
package and there I create cubes of Landsat bands (extracting sub-regions) that are UInt16. Sometimes I read layers with GDAL (to preserve the Band-interleaved layout) other times I read them with GMT. I'm not sure right now but I think UInt16
are interpreted as grids in GMT. And nothing stops to have a cube of floats saved in a GeoTIFF file.
Band 1 Block=435x1 Type=UInt16, ColorInterp=Gray
Description = Band 1 - Coastal aerosol [0.43-0.45]
Band 2 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 2 - Blue [0.45-0.51]
Band 3 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 3 - Green [0.53-0.59]
Band 4 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 4 - Red [0.64-0.67]
Band 5 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 5 - Near Infrared [0.85-0.88]
Band 6 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 6 - SWIR 1 [1.57-1.65]
Band 7 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 7 - SWIR 2 [2.11-2.29]
Band 8 Block=435x1 Type=UInt16, ColorInterp=Undefined
Description = Band 10 - Thermal IR 1 [10.6-11.19]
What we want to avoid scanning are the UInt8. One (indexed) or 3 bands (RGB) files.
Yes OK. Any multiband is by definition (in GMT) not a grid. And if data type is Uint8 (and one band) we can assume gray image. Isnt that a clear enough criteria for when not to do the scanning?
Yes, but raster_count
holds the answer of a GDAL function and will say 3 if the file is an UInt8 RGB image. And we don't want to scan those neither.
I must be misunderstanding you, or you me. This is the code I will be trying:
if (raster_count > 0) {
GDALRasterBandH hBand = GDALGetRasterBand(hDataset, 1);
if (z_min == DBL_MAX && raster_count == 1 && GDALGetRasterDataType (hBand) != GDT_Byte) { /* We don't know yet the dataset Min/Max */
GDALComputeRasterMinMax(hBand, false, adfMinMax); /* Unfortunately, cannot trust metadata min/max */
So to me this means that we ONLY do the scan if
Otherwise we will scale the single band for min/max. No?
With that change this goes from 9 seconds to instant:
gmt grdcut -JG-73.8333/40.75/13.5c+du+z160+a210+t55+v36 -Rg @earth_night_30s -D
270.975 302.766666667 28.15 53.4666666667 0.00833333333333 0.00833333333333
Nine seconds per frame times hundreds or thousands add up!
Let me try to explain my point with this supposing
Suppose that when you prepared the earth_day|night
images you had not quantize'd them and kept the RGB version. raster_count
would = 3 and it now would take 3*9 = 27 seconds.
My point is that, the refuse to scan should be based on the data type only and not on the number of raster layers.
OK, so you would prefer this then:
if (raster_count > 0) {
GDALRasterBandH hBand = GDALGetRasterBand(hDataset, 1);
if (z_min == DBL_MAX && GDALGetRasterDataType (hBand) != GDT_Byte) { /* We don't know yet the dataset Min/Max */
GDALComputeRasterMinMax(hBand, false, adfMinMax); /* Unfortunately, cannot trust metadata min/max */
where we do not care how many bands but only check the first band for type, and if GDT_Byte then we dont do it. OK?
Yes, that should be it. I don't think we can (or makes sense to) have different data types in one subdataset.
And, hBand
is already a GDALRasterBandH
, I think.
Closed due to #5789.
Description of the desired feature
It the special cases of @earth_day and @earth_night data sets (for all the resolutions), the way we obtain the grid header via GDAL involves reading the entire grid. This takes noticeable time for the higher resolutions. Yet, the answer is known: global region and given resolution. As the first step in grid functions is often to just get the header, having a special case check for the known global grids, especially the images, would speed things up considerably. Thoughts on this, @joa-quim ?