GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
862 stars 359 forks source link

Faster determination of global TIFF header structure #5784

Closed PaulWessel closed 3 years ago

PaulWessel commented 3 years ago

Description of the desired feature

It the special cases of @earth_day and @earth_night data sets (for all the resolutions), the way we obtain the grid header via GDAL involves reading the entire grid. This takes noticeable time for the higher resolutions. Yet, the answer is known: global region and given resolution. As the first step in grid functions is often to just get the header, having a special case check for the known global grids, especially the images, would speed things up considerably. Thoughts on this, @joa-quim ?

joa-quim commented 3 years ago

Hmm, gmt_gdalread has this case that is used when asking just the file info and not the data. Why is it reading the entire file?

    if (metadata_only) {
        plhs[0] = populate_metadata_struct (gdal_filename, correct_bounds, pixel_reg, got_R, 
                                            nXSize, nYSize, dfULX, dfULY, dfLRX, dfLRY, z_min, z_max, get_drivers);
        return;
    }
PaulWessel commented 3 years ago

Perhaps I should check the case first before I make assumtions. Maybe something else took longer time than I expected.

PaulWessel commented 3 years ago

OK, so stepped through. Getting wesn as you said quickly, but then...

    /* Fill in the rest of the GMT header values (If ...) */
    if (raster_count > 0) {
        if (z_min == 1e50) {        /* We don't know yet the dataset Min/Max */
            GDALComputeRasterMinMax(hBand, false, adfMinMax);       /* Unfortunately, cannot trust metadata min/max */

The GDALComputeRasterMinMax takes several seconds for the 30 arc grid so presumably it is looping over all to set min/max. Of course, we are not always that interested in the zmin/zmax compared to needing to know wesn and inc.

joa-quim commented 3 years ago

It seems we have a deadlock. Not long ago we made a change to not use the quick get min/max because "cannot trust metadata min/max" and now it spends too long computing it.

PaulWessel commented 3 years ago

Yes, true. This command takes 9 seconds on my Mac:

gmt grdcut @earth_day -Rg -JG15/20/2500/0/0/0/30/30/15c -D

which is a pretty long wait for something you expect to be instant since the values of the grids do not enter. So perhaps we need a finer control on when we do or do not read those values. We could pass that via a mode to GMT_Read_Data, e.g..,

if ((G = GMT_Read_Data (API, GMT_IS_GRID, GMT_IS_FILE, GMT_IS_SURFACE, GMT_CONTAINER_AND_DATA|GMT_NO_Z_SEARCH, NULL, Ctrl->In.file, NULL)) == NULL) {

where GMT_NO_Z_SEARCH is a (new) value in gmt_resources.h that can be consulted and fed to _gmtgdalread so it skips that loop that is usually done?

joa-quim commented 3 years ago

Yes, but perhaps we should go a bit further. There is in fact no need to inquire min/max for images. We have almost no use for it. The exception that occurs to me is if we want create a cpt for an indexed image, but even here I don't recall if we have a solution to add/replace cpts to indexed images.

Even for floats, the default in the quick method to estimate the min/max is probably the most common situation, but here we would need a mechanism to request the scanning to not fall again in the case that forced the use of the exact search.

PaulWessel commented 3 years ago

So I tend to agree with this. Perhaps we should just first no do this for images that have more than 1 band since surely for those zmin/max is meaningless. That would "fix" the current problem.

PaulWessel commented 3 years ago

What do you think of this minor change (checking if raster_count is 1):

    /* ------------------------------------------------------------------------- */
    /* Fill in the rest of the GMT header values (If ...) */
    if (raster_count > 0 && n) {
        if (z_min == 1e50 && raster_count == 1) {       /* We don't know yet the dataset Min/Max */

Also, may I replace the ±1e50 with ±DBL_MAX?

joa-quim commented 3 years ago

But that would apply equally to grids and images

Having a ±DBL_MAX in zmin/max is even uglier then ±1e50, but none of the is very good. Max datatype would be better but too much work for it.

PaulWessel commented 3 years ago

But that would apply equally to grids and images

Grids have one band so we will go in and check zmin/zmax if we get here. For multiband images (rgb) we will not. Isn't that the most basic check we can do?

Having a ±DBL_MAX in zmin/max is even uglier then ±1e50, but none of the is very good. Max datatype would be better but too much work for it.

Sure, but these are double and the test is just used to see if those values have changed:

double z_min = 1e50, z_max = -1e50;

PaulWessel commented 3 years ago

Of course DBL_MAX and 1e50 does not change anything but note that DBL_MAX is used in this way throughout GMT so it is odd to switch to another constant just in one gmt*.c file.

joa-quim commented 3 years ago

Grids have one band

I'm not sure of this at all. For example, I'm working in the satellite RemoteS package and there I create cubes of Landsat bands (extracting sub-regions) that are UInt16. Sometimes I read layers with GDAL (to preserve the Band-interleaved layout) other times I read them with GMT. I'm not sure right now but I think UInt16 are interpreted as grids in GMT. And nothing stops to have a cube of floats saved in a GeoTIFF file.

Band 1 Block=435x1 Type=UInt16, ColorInterp=Gray
  Description = Band 1 - Coastal aerosol [0.43-0.45]
Band 2 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 2 - Blue [0.45-0.51]
Band 3 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 3 - Green [0.53-0.59]
Band 4 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 4 - Red [0.64-0.67]
Band 5 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 5 - Near Infrared [0.85-0.88]
Band 6 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 6 - SWIR 1 [1.57-1.65]
Band 7 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 7 - SWIR 2 [2.11-2.29]
Band 8 Block=435x1 Type=UInt16, ColorInterp=Undefined
  Description = Band 10 - Thermal IR 1 [10.6-11.19]
joa-quim commented 3 years ago

What we want to avoid scanning are the UInt8. One (indexed) or 3 bands (RGB) files.

PaulWessel commented 3 years ago

Yes OK. Any multiband is by definition (in GMT) not a grid. And if data type is Uint8 (and one band) we can assume gray image. Isnt that a clear enough criteria for when not to do the scanning?

joa-quim commented 3 years ago

Yes, but raster_count holds the answer of a GDAL function and will say 3 if the file is an UInt8 RGB image. And we don't want to scan those neither.

PaulWessel commented 3 years ago

I must be misunderstanding you, or you me. This is the code I will be trying:

    if (raster_count > 0) {
        GDALRasterBandH hBand = GDALGetRasterBand(hDataset, 1);
        if (z_min == DBL_MAX && raster_count == 1 && GDALGetRasterDataType (hBand) != GDT_Byte) {   /* We don't know yet the dataset Min/Max */
            GDALComputeRasterMinMax(hBand, false, adfMinMax);   /* Unfortunately, cannot trust metadata min/max */

So to me this means that we ONLY do the scan if

  1. There is only one band
  2. That band is not a byte (UInt8)

Otherwise we will scale the single band for min/max. No?

PaulWessel commented 3 years ago

With that change this goes from 9 seconds to instant:

gmt grdcut -JG-73.8333/40.75/13.5c+du+z160+a210+t55+v36 -Rg  @earth_night_30s -D
270.975 302.766666667   28.15   53.4666666667   0.00833333333333    0.00833333333333
PaulWessel commented 3 years ago

Nine seconds per frame times hundreds or thousands add up!

joa-quim commented 3 years ago

Let me try to explain my point with this supposing

Suppose that when you prepared the earth_day|night images you had not quantize'd them and kept the RGB version. raster_count would = 3 and it now would take 3*9 = 27 seconds.

My point is that, the refuse to scan should be based on the data type only and not on the number of raster layers.

PaulWessel commented 3 years ago

OK, so you would prefer this then:

    if (raster_count > 0) {
        GDALRasterBandH hBand = GDALGetRasterBand(hDataset, 1);
        if (z_min == DBL_MAX &&  GDALGetRasterDataType (hBand) != GDT_Byte) {   /* We don't know yet the dataset Min/Max */
            GDALComputeRasterMinMax(hBand, false, adfMinMax);   /* Unfortunately, cannot trust metadata min/max */

where we do not care how many bands but only check the first band for type, and if GDT_Byte then we dont do it. OK?

joa-quim commented 3 years ago

Yes, that should be it. I don't think we can (or makes sense to) have different data types in one subdataset. And, hBand is already a GDALRasterBandH, I think.

PaulWessel commented 3 years ago

Closed due to #5789.