Open rouault opened 1 year ago
GTIFF: used TILED=YES by default
About this default without overviews, it can be really frustrating. It happened to me: create a big tiff (one GB is enough) with tiles and no overviews and try to open it in QGIS. It may take minutes to show something. In that time you could think that the file or the program are broken. Now I know it and I prevent that (either with overviews/cog or not opening it). I know the benefits of tiles. However I don't know how many frustrated users will blame us.
About this default without overviews, it can be really frustrating.
Is this any different than with a stripped tif also without overviews ?
When answering to users questions I have been thinking often that if we had no defaults at all then users should first think what they want. But I do not really want to write every time things like BLOCKXSIZE=256 BLOCKYSIZE=256. With the data that I play with I would say that TILED=YES could be a better default but I am not sure about COMPRESS=LZW. But LZW does make files smaller and maybe it does not make any harm in common use cases. What makes it better than DEFLATE?
Is this any different than with a stripped tif also without overviews ?
yes, scanline organized TIFFs are faster to downscale (when using nearest neighbour), as GDAL will only try to access 1 line over N if downscaling by a factor of N. For tiled datasets, unless N is at least twice the block height, you need to read all tiles.
I have been thinking often that if we had no defaults at all then users should first think what they want
There are tons of settings that should be set then: for a multiband dataset, INTERLEAVE ; for a non-Byte dataset, endianness; for compressed method, a number of compression options, etc. Think of JPEG2000 which has like 20 independent settings. For small enough datasets, in general you don't care about the settings. They become more relevant for big datasets where they have visible effects.
But LZW does make files smaller and maybe it does not make any harm in common use cases. What makes it better than DEFLATE?
Thought to be faster to compress/decompress. But just trying on one dataset, I find deflate to be actually faster (using libdeflate)
About this default without overviews, it can be really frustrating.
Is this any different than with a stripped tif also without overviews ?
Definitely. Let's say that to view the full image you have a zoom factor of 100 (So you are showing 0.01% of the total pixels), and your tiles are 256x256. With stripped it will read 1 every 100 lines. For a tiled tiff it will read every tile (just to read a few pixels on each). I/O will be 100 times more.
What makes it better than DEFLATE?
I don't know of any software that does not support LZW. I do know at least one (IDL) that does not support deflate. Aside from that I don't have any strong preference over one or another, especially since deflate has become so much faster with Even's libdeflate integration.
I have been thinking often that if we had no defaults at all then users should first think what they want
There are tons of settings that should be set then:
Yes, I was not serious. Maybe the compression is the default that astonish users most and by changing the default we could save gigantic amout of bits in the world because so many users run with the defaults. There are loads of questions like this https://gis.stackexchange.com/questions/310149/gdal-nearblack-increases-file-size-is-this-expected-behaviour.
One could say that we should just use reasonable defaults but what is reasonable for one may not be that for the others. A lossless compression that does not alter data feels like a good default for GeoTIFF, but JPEG2000 drivers use lossy compression by default and I think that it is reasonable, too. Now thinking, maybe a bit questionable because someone may have lost the smallest details from their data unintentionally. On the other hand, 30% reduction in file size with one of the best compression methods that exist could feel disappointing for the users.
When it comes to this ticket as a whole, the GeoTIFF defaults may not be the most important items on the list. But because they are so concrete it is natural that they gather many comments.
There is one I would like to propose - although I would almost certainly have to implement it:
API Behaviour change
There is an incompatibility between SWIG generated C# code and Mono AOT compilers that cause failures when there are certain types of callbacks.
See https://github.com/swig/swig/pull/1262 and related.
The particular usage scenario that I have come across is
2 When targetting the C# code to run in Mono with AOT (publishing WASM requires AOT)
Basically if the only way to do a callback from C to C# is via static methods marked with [MonoPInvokeCallback(typeof(NativeBinding.callback))], and in the case of class methods take a GCHandle to this as > the first parameter.
This affects any AOT in Mono. The main consumer of this is probably IL2CPP in Unity but it is also a problem in Xamarin on iOS.
The fixes - as discussed here https://forum.htc.com/topic/9139-unity-il2cpp-and-callbacks/ are not complicated and actually are a matter of metadata as opposed to functional changes.
I have been implementing the changes manually for the UPM Package for GDAL that I support - without any problems so far. Doing this manually is less than ideal but I have been reluctant to port this change into the main repo because of the potential for breaking something that is not being tested (i.e. the "you don't know what you don't know" problem).
I would say that 4.0 would be the time to implement the change.
What makes it better than DEFLATE?
I don't know of any software that does not support LZW. I do know at least one (IDL) that does not support deflate. Aside from that I don't have any strong preference over one or another, especially since deflate has become so much faster with Even's libdeflate integration.
Are you sure this is still true, looks like reference docs indicate IDL can now write Deflate (would be weird to not read) https://www.nv5geospatialsoftware.com/docs/WRITE_TIFF.html
@wildintellect support for deflate seems to have been added in version 8.8.2, released in march 2022, which can be considered very recent given their release velocity and adoption rate.
I'd like to suggest ditching .py
from the canonical way of calling the Python-implemented utilities (e.g. gdal_calc.py
would become gdal_calc
). This appears to make for easier implementation of Python package entry points, see conda-forge/gdal-feedstock#834, and is also more consistent with the remaining utilies.
I'd like to suggest ditching
.py
from the canonical way of calling the Python-implemented utilities (e.g.gdal_calc.py
would becomegdal_calc
). This appears to make for easier implementation of Python package entry points, see conda-forge/gdal-feedstock#834, and is also more consistent with the remaining utilies.
attempt at fixing that in https://github.com/OSGeo/gdal/pull/8718 . Apparently on Windows setuptools create .exe launchers
Adding another potential topic:
About this default without overviews, it can be really frustrating.
Is this any different than with a stripped tif also without overviews ?
Today the default TILED=YES came up again in the mailing list.
Based on his answer, I have the impression that @tbonfort was not aware of the consequences of using tiles without overview. And probably many users as well.
However it is really frustrating, even something not huge like 10k * 10k pixels. I have seen that several times at work. IMO the default should stay as it is now (it works good enough for a significant number of users). If a user creates overviews should know that it will perform better with tiles. Or even better, they can just create a COG, that has tiles without any option. But I don't think it is a good idea bothering many users just to avoid setting TILED=YES when we create a geotiff ... with overviews. Adding overviews artificially is over-complicating it just to change the default option; and very slow, that is the main problem.
I want to think there are two types of users: those who really go into the options, and configure the driver a lot; and those who just create a geotiff using the minimal configuration... because it is complicated. (at the beginning I was the latter, I want to think that now I am the former). For the first type we have all those options. It is really powerful. For the second type they rely on the defaults. It is simpler.
IMO the default should stay as it is now (it works good enough for a significant number of users)
This is not a hill I am willing to die on. Large tiffs that should have overviews but are missing them suck, and in that case stripped tiffs suck a bit less. In all other cases tiled tiffs are a better option, but my fingers have sufficient muscle memory to type -co TILED=YES
every time I type a gdal command for me not to consider ditching gdal as a whole because of that :)
When it comes to potential GDAL 4.0 changes I think that the TILED=YES or TILED=NO is not the most critical topic.
Defaults can never be optimal for all users. My opinion is that TILED=YES would be better for more users than TILED=NO, but I do not know all users. My typical use case for smaller or bigger TIFFs is to look at them at the resolution that is close to the native resolution, either on the screen or by using them as source data for WMS.
It happened to me: create a big tiff (one GB is enough) with tiles and no overviews and try to open it in QGIS. It may take minutes to show something. In that time you could think that the file or the program are broken. Now I know it and I prevent that (either with overviews/cog or not opening it). I know the benefits of tiles. However I don't know how many frustrated users will blame us.
I tried to re-produce with an image of "Size is 48000, 24000", file size on disk 3.5 GB. It takes 5 seconds to open with QGIS on Windows, (Intel64 Family 6 Model 142 Stepping 10 GenuineIntel ~1910 Mhz, 32 GB of memory, SSD drive). It does feel a bit slow but acceptable. Panning when zoomed close to the native resolution is naturally fast. Maybe a part of your frustration is caused by the hardware.
I know from gis.stackexchange that it is not uncommon to have severe troubles when trying to warp or otherwise process big striped TIFF files. In that case the only solution is to re-write the image as tiled because adding overviews does not fix the processing issue even it helps with viewing.
The biggest problem with changing the default of TILED= is the change itself. Users will continue to use versions 3.x and 4.x side by side for several years and during that time those users who rely on the defaults would not know what will happen because a) users do not usually know what GDAL version they are using and b) users should be aware about the change of the default value between v3 and v4. That would certainly frustrate some users. Maybe the majority of users would not notice anything.
to type
-co TILED=YES
every time I type a gdal command for me not to consider ditching gdal as a whole because of that :)
making me wondering if a (non-breaking) improvement could not be to extend the GDAL configuration file to have a new section where could could add:
[default-creation-options-for-cli]
GTiff=TILED=YES
GTiff=COMPRESS=DEFLATE
that would be used for command line utilities. I believe they should display a message in the console in non quiet mode to recall those options
$ gdal_translate in.tif out.tif Adding default creation options from /home/even/.gdal/gdalrc: -co TILED=YES -co COMPRESS=DEFLATE 0...10....20....30...40...50...60...70...80...90...100
That said it might perhaps only be desirable to use them for a command directly started from an interactive shell, and not from a script, and looking a bit, it seems tricky/impossible to detect reliably in which case we are, and even if we can, it might be too confusing.
Thanks Even for looking into it, but I think that the issues arising from the variability between environments with different configuration files would be more problematic than having to repeatedly type a few creation options.
Other ideas that have been floated around in the past:
Make port/ CPL library a standalone shared object that other projects could link against, without linking GDAL itself. The main benefit for them would be to get all the /vsi file systems. Although there are some GDAL specific hacks leaking into CPL, but it should still remain largely usable outside of it. Or perhaps just the /vsi/ part of CPL ? (as nobody really needs a CPLMalloc() or a CPLString nowadays :-))
Move GDAL installed headers to a gdal/ subdirectory . (and potentially cpl's one into a dedicated one too)
Can we drop some of the deprecated functions like GDALGetDefaultHistogram
? I just saw someone using it in new code.
Other candidate: removing support for direct calls to python bindings's setup.py script, and rather use "pip install" or other "modern"/recommended way of packaging Python ? (expert with 20 years of Python packaging experience required). Cf https://github.com/OSGeo/gdal/pull/8926
Here's a sorta breaking change that shouldn't be too controversial: make /vsicurl/
the default HTTP handler. This is the case now for Rasterio, and users appreciate it.
My opinion is that the option to fully download the resource at a URI and open it from a temporary location should exist at a level above GDALOpenEx, not within GDALOpenEx or driver code.
gdaladdo / GDALBuildOverviews does not take creation options, but instead needs to pass these as configuration options. This leads to a de-synchronization between what options are available for main datasets vs. what is available for overview levels, and a clunky api/documentation. Last example to date: #8976 Breaking change for 4.0: make GDALBuildOverviews take creation options instead of relying on configuration options.
Another potential change is get rid of the wkbPoint25D, wkbLineString25D, etc. constants of the wkbGeometryType enumeration that pre-date ISO WKB, and that have "funny" values that are wkbPoint, wkbLineString or'ed with 0x8000000, which leads to unusual enumeration values not fitting into a signed int (apparently C23 makes it legal... cf https://github.com/OSGeo/gdal/issues/2322#issuecomment-1904970402). It could be best to have wkbPointZ = wkbPoint + 1000 to be ISO WKB compliant. That would impact the C & C++ API & ABI.
Remove unused OFTWideString and OFTWideStringList from OGRFieldType enumeration
Installing headers in ${CMAKE_INSTALL_INCLUDEDIR}/gdal : https://github.com/OSGeo/gdal/issues/9276
Interesting GTiff discussion here. I've been looking at GeoTIFF read performance lately and some things I've noticed (from C# using MaxRev.Gdal) which might be of interest to GDAL 4.0 planning are
RasterIO()
could just populate buffers provided by individual threads directly. Users could maybe alleviate this by upping cache allocations in GDAL config. However, doing so's disadvantageous whenever the next command needs the memory for something else.GTIFF_DIRECT_IO=YES
has consistently profiled slower than the default of NO
, has had no effect on IO request sizes, and increases DDR bandwidth demand. This happens for entire .tif reads (Dataset.ReadRaster()
), whole band reads, block reads, and single row reads (all Band.ReadRaster()
). This surprises me as it seems reasonable to expect bypassing the block cache to be (marginally) faster, avoid allocating duplicate memory (once for the cache and once for the destination buffer, presumably), and reduce demand for DDR bandwidth.Gdal.SetConfigOption()
provides something of an alternative but is awkward whenever different threads are doing different things with different files. Seems like it'd be easier if there was a way to directly configure a particular instance of the GTiff driver or IO operations on a particular Dataset
.RasterIO()
(via C# SWIG) not to deinterleave pixels. For a lot of the processing I do it'd probably be fastest (minimum memory, lowest DDR bandwidth, greatest cache locality) just to yank interleaved pixels into a buffer.It's not entirely clear to me what GDAL expects of its callers for efficient IO but none of the numerous combinations I've tried yields uncompressed GeoTIFF read throughput much above 1 GB/s per thread (tested with a 5950X and PCIe 4.0 x4 NVMe). Multithreaded scaling could likely also be increased as DDR bandwidth demand is fairly high (7–10 GB/s per GB/s read) compared to analogous non-GDAL cases.
Also, kind of an as aside, SWIG isn't making use of .NET types such as ReadOnlySpan<T>
, Span<T>
, INumber<T>
and ReadRaster()
and WriteRaster()
overloads haven't kept up with DataType
. Not a big deal but, for example, it feels a little odd not to be able to do things like stackalloc double[6]
for Dataset.GetGeoTransform()
.
It would be good if 'gdal_translate -projwin' didn't act like 'gdalwarp -te ' for some resample algs, applying the projwin to the output rather than aligning to input pixels. I don't get why this was mixed up, having gdal_translate always be in alignment seems like a good default and let warp do prescriptive target extent. 🙏
It would be good if 'gdal_translate -projwin' didn't act like 'gdalwarp -te ' for some resample algs, applying the projwin to the output rather than aligning to input pixels.
I am curious, could you show an understandable example about what is wrong, maybe with some images?
for sure, if you request a window that is not aligned to the source pixels, it snaps to the source when resample is 'near'
gdal_translate vrt://gcore/data/rgbsmall.tif?a_ullr=0,50,50,0 near.vrt -projwin 5.5 15.5 10.3 0
gdalinfo near.vrt
...
Origin = (5.000000000000000,16.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
...
When you use 'bilinear', it uncouples from the source grid and gives the provided extent (like warp always does).
gdal_translate vrt://gcore/data/rgbsmall.tif?a_ullr=0,50,50,0 bilinear.vrt -projwin 5.5 15.5 10.3 0 -r bilinear
gdalinfo bilinear.vrt
...
Origin = (5.500000000000000,15.500000000000000)
Pixel Size = (0.960000000000000,-0.968750000000000)
...
So, to get alignment you have to your own calcs and provide the right snapped projwin.
I was gobsmacked when this was pointed out to me ... but, it's in a Note here, and I understand the concern about the shift but I think it was the wrong fix: https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-projwin
Inconsistent use of String(JSON) type in the GeoJSON driver:
if there's a mix of data types, a String(JSON) field is reported to mean that.
The only annoying thing is that for backward compatibility with past behaviour of GDAL 3.5 where we silently homogenized to a string, we didn't go to the point to actually quoting strings, so this isn't fully JSON compliant unfortunately
I mean if we have out.json with:
{
"type": "FeatureCollection",
"features": [
{ "type": "Feature", "properties": { "foo": "str" }, "geometry": null },
{ "type": "Feature", "properties": { "foo": 0 }, "geometry": null },
{ "type": "Feature", "properties": { "foo": ["a", "b"] }, "geometry": null }
]
}
$ ogrinfo -al out.geojson -q
Layer name: out
OGRFeature(out):0
foo (String(JSON)) = str
OGRFeature(out):1
foo (String(JSON)) = 0
OGRFeature(out):2
foo (String(JSON)) = [ "a", "b" ]
In theory, we should report "str", not just str.
Deprecate use of term "dateline"
List of tickets tagged with 4.0 milestone: https://github.com/OSGeo/gdal/milestone/33
Below are just proposed ideas. Nothing decided
API breakage:
C and C++ API:
[ ] #8288, #8396, #8399
[ ] Remove GDALRasterIOExtraArg structure. This was introduced as a backwards compatible way to express sub-pixel windows and non-nearest resampling, but it turns out to be error prone within the code base as it easy is to forget updating values in that structure (cf https://github.com/OSGeo/gdal/pull/8797). We could potentially modify IRasterIO() signature to take floating point values for the coordinates of the window of interest, the resampling algorithm and the progress function.
[ ] Removing C, C++, SWIG API for OGRSFDriver, and OGRDataSource ? Might be too much as they are still used in the wild... So we should have a long deprecation period before so people can migrate to GDALDriver and GDALDataset Cf https://github.com/OSGeo/gdal/issues/10783#issuecomment-2347144100 for an example where maintaining those old things are a bit annoying
C++ API (mostly):
[ ] OGRLayer::GetNextFeature() and GetFeature() to return a
std::unique_ptr<OGRFeature>
: involves lots of internal changes, affect C++ users[ ] or change OGRLayer::GetNextFeature() and GetFeature() to accept a
OGRFeature&
that they update. May help a bit performance in bulk reading operations, but probably not spectacular. EDIT: might not be doable, as some drivers return an instance of a subclass of OGRFeature[ ] Remove reference count mechanism from OGRSpatialReference class itself and use
std::shared_ptr<OGRSpatialReference>
(orstd::shared_ptr<const OGRSpatialReference>
?) instead : involves lots of internal changes, affect C++ users of the geometry and layer classes, although we might have to propagate it everywhere since OGRSpatialReferenceH would be an opaque type forstd::shared_ptr<OGRSpatialReference>
[ ] Remove reference count mechanism from OGRFeatureDefn class itself and use
std::shared_ptr<OGRFeatureDefn>
instead: involves lots of internal changes, affect C++ users, may impact slightly the C API (OGR_FD_GetReferenceCount)[ ] Remove reference count mechanism from GDALDataset class itself & remove OF_SHARED? GDALOpenShared used to be mostly used in the VRT driver but a more sophisticated mechanism has replaced it when opening from XML. But when building a VRT dynamically shared/ref_counting is used. And there are GDALOpenShared() users in external code though
C ABI:
[ ] #8158 : caution on existing external Windows code implementing the GDALProgressFunc typedef that would use manual stdcall. That said stdcall is a no-op on x64 and arm64, so the impact nowadays would be only for legacy x86 code
API behaviour changes:
Driver changes:
OGR SQL:
Driver removals?
Others :