OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.66k stars 2.46k forks source link

Optimized TIFF to TIFF CreateCopy() #8449

Open latot opened 9 months ago

latot commented 9 months ago

Hi, I was using gdal_translate, when I notice some of the process was pretty slow, then testing I notice this:

time gdal_translate -co COMPRESS=ZSTD -co SPARSE_OK=TRUE -co TILED=YES zstd.tif zstd2.tif
Input file size is 46451, 148644
0...10...20...30...40...50...60...70...80...90...100 - done.

real    1m1,703s
user    0m59,864s
sys 0m1,689s

Oks, lets introduce the files, zstd.tif is a file with exactly the same options as zstd2.tif, so what we are doing now is, "do nothing" to the file, still is using almost 1min while copy 1.9GB is only a seconds.

If the file was edited, that would means the only extra part to process would be SPARSE_OK, some translate takes a lot of time with big rasters, maybe due to this.

But no idea how GDAL handles translate here, if this function is in gdal or libtiff.

Thx!

jratike80 commented 9 months ago

GDAL do not think that you ask "do nothing", it believes that you were serious when you asked to compress the image with ZSTD and it triggers decompress-compress. If you want to edit something that can be edited without re-writing the pixels, check if gdal_edit can do it for you https://gdal.org/programs/gdal_edit.html

latot commented 9 months ago

GDAL do not think that you ask "do nothing", it believes that you were serious when you asked to compress the image with ZSTD and it triggers decompress-compress. If you want to edit something that can be edited without re-writing the pixels, check if gdal_edit can do it for you https://gdal.org/programs/gdal_edit.html

D: I think would be great to check it, there is several reasons.

One is that, some times you can request translate to be sure the file has specific params.

Other times you need to run it in order to clean space, how libtiff is not able to reduce a file size, if by ZSTD now your file has less size, you will not notice it until you translate it. Same for SPARSE_OK options.

Maybe is not necessary to check and compare every step, only the expensive ones to write, for example if the compression changes. Obvs the ideal would be execute only the needed tasks, and not all of them.

rouault commented 9 months ago

you may want to use https://github.com/airbusgeo/cogger

jratike80 commented 9 months ago

I do not believe that there is generally enough metadata in the images to know if compression changes. For example ZSTD can be written by using predictors or error thresholds. Maybe it could work with lossless compression methods. However, implementing such checks into a generic gdal_translate tool feels very complicated. If the use case is well defined maybe a custom Python script could be a solution.

latot commented 9 months ago

Yeah, seems complicated, I had chat on matrix, if you want to know the length of the raw data of a tile, you need use libtiff.

I don't know who optimize/organize the file, GDAL or libtiff in order to know the next steps D:

latot commented 9 months ago

@rouault @jratike80 I know, at least there is a maybe possible optimization, if the compression tag is the same on the source and dest, would be possible to skip compression/decompression. But no idea if skipping this step will reduce the file size in case there is non-used space.

jratike80 commented 9 months ago

Even suggested to have a look at cogger. If it does not do all that you want so maybe it could be enhanced or forked. The basic idea of GDAL is to convert data between different formats by going through a uniform data structure. That is not optimal for all use cases.

Cogger does not do any pixel manipulation on the provided image, it is up to you to provide an input geotiff which can be suitably transformed to a COG, namely:

it must be internally tiled it should be compressed with one of the standard supported tiff compression mechanisms it should contain overviews