[meta-ticket] GDAL 4.0 potential changes related to API or behaviour breaks

rouault commented 1 year ago

List of tickets tagged with 4.0 milestone: https://github.com/OSGeo/gdal/milestone/33

Below are just proposed ideas. Nothing decided

API breakage:

C and C++ API:
[ ] #8288, #8396, #8399
[ ] Remove GDALRasterIOExtraArg structure. This was introduced as a backwards compatible way to express sub-pixel windows and non-nearest resampling, but it turns out to be error prone within the code base as it easy is to forget updating values in that structure (cf https://github.com/OSGeo/gdal/pull/8797). We could potentially modify IRasterIO() signature to take floating point values for the coordinates of the window of interest, the resampling algorithm and the progress function.
[ ] Removing C, C++, SWIG API for OGRSFDriver, and OGRDataSource ? Might be too much as they are still used in the wild... So we should have a long deprecation period before so people can migrate to GDALDriver and GDALDataset Cf https://github.com/OSGeo/gdal/issues/10783#issuecomment-2347144100 for an example where maintaining those old things are a bit annoying
C++ API (mostly):
[ ] OGRLayer::GetNextFeature() and GetFeature() to return a std::unique_ptr<OGRFeature>: involves lots of internal changes, affect C++ users
[ ] or change OGRLayer::GetNextFeature() and GetFeature() to accept a OGRFeature& that they update. May help a bit performance in bulk reading operations, but probably not spectacular. EDIT: might not be doable, as some drivers return an instance of a subclass of OGRFeature
[ ] Remove reference count mechanism from OGRSpatialReference class itself and use std::shared_ptr<OGRSpatialReference> (or std::shared_ptr<const OGRSpatialReference> ?) instead : involves lots of internal changes, affect C++ users of the geometry and layer classes, although we might have to propagate it everywhere since OGRSpatialReferenceH would be an opaque type for std::shared_ptr<OGRSpatialReference>
[ ] Remove reference count mechanism from OGRFeatureDefn class itself and use std::shared_ptr<OGRFeatureDefn> instead: involves lots of internal changes, affect C++ users, may impact slightly the C API (OGR_FD_GetReferenceCount)
[ ] Remove reference count mechanism from GDALDataset class itself & remove OF_SHARED? GDALOpenShared used to be mostly used in the VRT driver but a more sophisticated mechanism has replaced it when opening from XML. But when building a VRT dynamically shared/ref_counting is used. And there are GDALOpenShared() users in external code though
C ABI:
[ ] #8158 : caution on existing external Windows code implementing the GDALProgressFunc typedef that would use manual stdcall. That said stdcall is a no-op on x64 and arm64, so the impact nowadays would be only for legacy x86 code

API behaviour changes:

[ ] #4681
[ ] #7452
[ ] Make c# API compatible with MONO AOT compilers e.g. Xamarin or IL2CPP

Driver changes:

[ ] #4712
[ ] GTIFF: used TILED=YES by default --> #10750
[ ] GTIFF: use COMPRESS=LZW by default

OGR SQL:

[ ] Require geometry field to be explicitly mentioned in SELECT ?

Driver removals?

Others :

[ ] Changes in error handling mechanism (to be defined what changes would be desirable) ? Cf https://lists.osgeo.org/pipermail/gdal-dev/2022-July/056066.html
[ ] Design and implement a unified way to describe a datasource with open options and subdatasets with a string, see the discussion here: https://lists.osgeo.org/pipermail/gdal-dev/2023-September/057641.html. I'm not sure it would be an API break though.

tbonfort commented 1 year ago

[ ] GTIFF: used TILED=YES by default
[ ] GTIFF: use COMPRESS=LZW by default

elpaso commented 1 year ago

[ ] Design and implement a unified way to describe a datasource with open options and subdatasets with a string, see the discussion here: https://lists.osgeo.org/pipermail/gdal-dev/2023-September/057641.html. I'm not sure it would be an API break though.

jjimenezshaw commented 1 year ago

GTIFF: used TILED=YES by default

About this default without overviews, it can be really frustrating. It happened to me: create a big tiff (one GB is enough) with tiles and no overviews and try to open it in QGIS. It may take minutes to show something. In that time you could think that the file or the program are broken. Now I know it and I prevent that (either with overviews/cog or not opening it). I know the benefits of tiles. However I don't know how many frustrated users will blame us.

tbonfort commented 1 year ago

About this default without overviews, it can be really frustrating.

Is this any different than with a stripped tif also without overviews ?

jratike80 commented 1 year ago

When answering to users questions I have been thinking often that if we had no defaults at all then users should first think what they want. But I do not really want to write every time things like BLOCKXSIZE=256 BLOCKYSIZE=256. With the data that I play with I would say that TILED=YES could be a better default but I am not sure about COMPRESS=LZW. But LZW does make files smaller and maybe it does not make any harm in common use cases. What makes it better than DEFLATE?

rouault commented 1 year ago

Is this any different than with a stripped tif also without overviews ?

yes, scanline organized TIFFs are faster to downscale (when using nearest neighbour), as GDAL will only try to access 1 line over N if downscaling by a factor of N. For tiled datasets, unless N is at least twice the block height, you need to read all tiles.

rouault commented 1 year ago

I have been thinking often that if we had no defaults at all then users should first think what they want

There are tons of settings that should be set then: for a multiband dataset, INTERLEAVE ; for a non-Byte dataset, endianness; for compressed method, a number of compression options, etc. Think of JPEG2000 which has like 20 independent settings. For small enough datasets, in general you don't care about the settings. They become more relevant for big datasets where they have visible effects.

But LZW does make files smaller and maybe it does not make any harm in common use cases. What makes it better than DEFLATE?

Thought to be faster to compress/decompress. But just trying on one dataset, I find deflate to be actually faster (using libdeflate)

jjimenezshaw commented 1 year ago

About this default without overviews, it can be really frustrating.

Is this any different than with a stripped tif also without overviews ?

Definitely. Let's say that to view the full image you have a zoom factor of 100 (So you are showing 0.01% of the total pixels), and your tiles are 256x256. With stripped it will read 1 every 100 lines. For a tiled tiff it will read every tile (just to read a few pixels on each). I/O will be 100 times more.

tbonfort commented 1 year ago

What makes it better than DEFLATE?

I don't know of any software that does not support LZW. I do know at least one (IDL) that does not support deflate. Aside from that I don't have any strong preference over one or another, especially since deflate has become so much faster with Even's libdeflate integration.

jratike80 commented 1 year ago

I have been thinking often that if we had no defaults at all then users should first think what they want

There are tons of settings that should be set then:

Yes, I was not serious. Maybe the compression is the default that astonish users most and by changing the default we could save gigantic amout of bits in the world because so many users run with the defaults. There are loads of questions like this https://gis.stackexchange.com/questions/310149/gdal-nearblack-increases-file-size-is-this-expected-behaviour.

One could say that we should just use reasonable defaults but what is reasonable for one may not be that for the others. A lossless compression that does not alter data feels like a good default for GeoTIFF, but JPEG2000 drivers use lossy compression by default and I think that it is reasonable, too. Now thinking, maybe a bit questionable because someone may have lost the smallest details from their data unintentionally. On the other hand, 30% reduction in file size with one of the best compression methods that exist could feel disappointing for the users.

When it comes to this ticket as a whole, the GeoTIFF defaults may not be the most important items on the list. But because they are so concrete it is natural that they gather many comments.

runette commented 1 year ago

There is one I would like to propose - although I would almost certainly have to implement it:

API Behaviour change

[ ] Make c# API compatible with MONO AOT compilers e.g. Xamarin or IL2CPP

Discussion

There is an incompatibility between SWIG generated C# code and Mono AOT compilers that cause failures when there are certain types of callbacks.

See https://github.com/swig/swig/pull/1262 and related.

The particular usage scenario that I have come across is

2 When targetting the C# code to run in Mono with AOT (publishing WASM requires AOT)

Basically if the only way to do a callback from C to C# is via static methods marked with [MonoPInvokeCallback(typeof(NativeBinding.callback))], and in the case of class methods take a GCHandle to this as > the first parameter.

This affects any AOT in Mono. The main consumer of this is probably IL2CPP in Unity but it is also a problem in Xamarin on iOS.

The fixes - as discussed here https://forum.htc.com/topic/9139-unity-il2cpp-and-callbacks/ are not complicated and actually are a matter of metadata as opposed to functional changes.

I have been implementing the changes manually for the UPM Package for GDAL that I support - without any problems so far. Doing this manually is less than ideal but I have been reluctant to port this change into the main repo because of the potential for breaking something that is not being tested (i.e. the "you don't know what you don't know" problem).

I would say that 4.0 would be the time to implement the change.

wildintellect commented 1 year ago

What makes it better than DEFLATE?

I don't know of any software that does not support LZW. I do know at least one (IDL) that does not support deflate. Aside from that I don't have any strong preference over one or another, especially since deflate has become so much faster with Even's libdeflate integration.

Are you sure this is still true, looks like reference docs indicate IDL can now write Deflate (would be weird to not read) https://www.nv5geospatialsoftware.com/docs/WRITE_TIFF.html

tbonfort commented 1 year ago

@wildintellect support for deflate seems to have been added in version 8.8.2, released in march 2022, which can be considered very recent given their release velocity and adoption rate.

plimkilde commented 11 months ago

I'd like to suggest ditching .py from the canonical way of calling the Python-implemented utilities (e.g. gdal_calc.py would become gdal_calc). This appears to make for easier implementation of Python package entry points, see conda-forge/gdal-feedstock#834, and is also more consistent with the remaining utilies.

rouault commented 11 months ago

I'd like to suggest ditching .py from the canonical way of calling the Python-implemented utilities (e.g. gdal_calc.py would become gdal_calc). This appears to make for easier implementation of Python package entry points, see conda-forge/gdal-feedstock#834, and is also more consistent with the remaining utilies.

attempt at fixing that in https://github.com/OSGeo/gdal/pull/8718 . Apparently on Windows setuptools create .exe launchers

rouault commented 11 months ago

Adding another potential topic:

[ ] Remove GDALRasterIOExtraArg structure. This was introduced as a backwards compatible way to express sub-pixel windows and non-nearest resampling, but it turns out to be error prone within the code base as it is easy to forget updating values in that structure (cf https://github.com/OSGeo/gdal/pull/8797). We could potentially modify IRasterIO() signature to take floating point values for the coordinates of the window of interest, the resampling algorithm and the progress function.

jjimenezshaw commented 11 months ago

About this default without overviews, it can be really frustrating.

Is this any different than with a stripped tif also without overviews ?

Today the default TILED=YES came up again in the mailing list.

Based on his answer, I have the impression that @tbonfort was not aware of the consequences of using tiles without overview. And probably many users as well.

However it is really frustrating, even something not huge like 10k * 10k pixels. I have seen that several times at work. IMO the default should stay as it is now (it works good enough for a significant number of users). If a user creates overviews should know that it will perform better with tiles. Or even better, they can just create a COG, that has tiles without any option. But I don't think it is a good idea bothering many users just to avoid setting TILED=YES when we create a geotiff ... with overviews. Adding overviews artificially is over-complicating it just to change the default option; and very slow, that is the main problem.

I want to think there are two types of users: those who really go into the options, and configure the driver a lot; and those who just create a geotiff using the minimal configuration... because it is complicated. (at the beginning I was the latter, I want to think that now I am the former). For the first type we have all those options. It is really powerful. For the second type they rely on the defaults. It is simpler.

tbonfort commented 11 months ago

IMO the default should stay as it is now (it works good enough for a significant number of users)

This is not a hill I am willing to die on. Large tiffs that should have overviews but are missing them suck, and in that case stripped tiffs suck a bit less. In all other cases tiled tiffs are a better option, but my fingers have sufficient muscle memory to type -co TILED=YES every time I type a gdal command for me not to consider ditching gdal as a whole because of that :)

jratike80 commented 11 months ago

When it comes to potential GDAL 4.0 changes I think that the TILED=YES or TILED=NO is not the most critical topic.

Defaults can never be optimal for all users. My opinion is that TILED=YES would be better for more users than TILED=NO, but I do not know all users. My typical use case for smaller or bigger TIFFs is to look at them at the resolution that is close to the native resolution, either on the screen or by using them as source data for WMS.

It happened to me: create a big tiff (one GB is enough) with tiles and no overviews and try to open it in QGIS. It may take minutes to show something. In that time you could think that the file or the program are broken. Now I know it and I prevent that (either with overviews/cog or not opening it). I know the benefits of tiles. However I don't know how many frustrated users will blame us.

I tried to re-produce with an image of "Size is 48000, 24000", file size on disk 3.5 GB. It takes 5 seconds to open with QGIS on Windows, (Intel64 Family 6 Model 142 Stepping 10 GenuineIntel ~1910 Mhz, 32 GB of memory, SSD drive). It does feel a bit slow but acceptable. Panning when zoomed close to the native resolution is naturally fast. Maybe a part of your frustration is caused by the hardware.

I know from gis.stackexchange that it is not uncommon to have severe troubles when trying to warp or otherwise process big striped TIFF files. In that case the only solution is to re-write the image as tiled because adding overviews does not fix the processing issue even it helps with viewing.

The biggest problem with changing the default of TILED= is the change itself. Users will continue to use versions 3.x and 4.x side by side for several years and during that time those users who rely on the defaults would not know what will happen because a) users do not usually know what GDAL version they are using and b) users should be aware about the change of the default value between v3 and v4. That would certainly frustrate some users. Maybe the majority of users would not notice anything.

rouault commented 11 months ago

to type -co TILED=YES every time I type a gdal command for me not to consider ditching gdal as a whole because of that :)

making me wondering if a (non-breaking) improvement could not be to extend the GDAL configuration file to have a new section where could could add:

[default-creation-options-for-cli]
GTiff=TILED=YES
GTiff=COMPRESS=DEFLATE

that would be used for command line utilities. I believe they should display a message in the console in non quiet mode to recall those options

$ gdal_translate in.tif out.tif Adding default creation options from /home/even/.gdal/gdalrc: -co TILED=YES -co COMPRESS=DEFLATE 0...10....20....30...40...50...60...70...80...90...100

That said it might perhaps only be desirable to use them for a command directly started from an interactive shell, and not from a script, and looking a bit, it seems tricky/impossible to detect reliably in which case we are, and even if we can, it might be too confusing.

tbonfort commented 11 months ago

Thanks Even for looking into it, but I think that the issues arising from the variability between environments with different configuration files would be more problematic than having to repeatedly type a few creation options.

rouault commented 11 months ago

Other ideas that have been floated around in the past:

Make port/ CPL library a standalone shared object that other projects could link against, without linking GDAL itself. The main benefit for them would be to get all the /vsi file systems. Although there are some GDAL specific hacks leaking into CPL, but it should still remain largely usable outside of it. Or perhaps just the /vsi/ part of CPL ? (as nobody really needs a CPLMalloc() or a CPLString nowadays :-))
Move GDAL installed headers to a gdal/ subdirectory . (and potentially cpl's one into a dedicated one too)

lnicola commented 11 months ago

Can we drop some of the deprecated functions like GDALGetDefaultHistogram? I just saw someone using it in new code.

rouault commented 11 months ago

Other candidate: removing support for direct calls to python bindings's setup.py script, and rather use "pip install" or other "modern"/recommended way of packaging Python ? (expert with 20 years of Python packaging experience required). Cf https://github.com/OSGeo/gdal/pull/8926

sgillies commented 10 months ago

Here's a sorta breaking change that shouldn't be too controversial: make /vsicurl/ the default HTTP handler. This is the case now for Rasterio, and users appreciate it.

My opinion is that the option to fully download the resource at a URI and open it from a temporary location should exist at a level above GDALOpenEx, not within GDALOpenEx or driver code.

tbonfort commented 10 months ago

gdaladdo / GDALBuildOverviews does not take creation options, but instead needs to pass these as configuration options. This leads to a de-synchronization between what options are available for main datasets vs. what is available for overview levels, and a clunky api/documentation. Last example to date: #8976 Breaking change for 4.0: make GDALBuildOverviews take creation options instead of relying on configuration options.

rouault commented 9 months ago

Another potential change is get rid of the wkbPoint25D, wkbLineString25D, etc. constants of the wkbGeometryType enumeration that pre-date ISO WKB, and that have "funny" values that are wkbPoint, wkbLineString or'ed with 0x8000000, which leads to unusual enumeration values not fitting into a signed int (apparently C23 makes it legal... cf https://github.com/OSGeo/gdal/issues/2322#issuecomment-1904970402). It could be best to have wkbPointZ = wkbPoint + 1000 to be ISO WKB compliant. That would impact the C & C++ API & ABI.

rouault commented 9 months ago

Remove unused OFTWideString and OFTWideStringList from OGRFieldType enumeration

rouault commented 8 months ago

Installing headers in ${CMAKE_INSTALL_INCLUDEDIR}/gdal : https://github.com/OSGeo/gdal/issues/9276

twest820 commented 3 months ago

Interesting GTiff discussion here. I've been looking at GeoTIFF read performance lately and some things I've noticed (from C# using MaxRev.Gdal) which might be of interest to GDAL 4.0 planning are

GTiff seems to have a strong preference for small IO requests. With many TIFFs, and likely most virtual raster tile sizes, transfers are a few kB to a few tens of kB, well under the megabyte or so needed to fully utilize drives' sequential read abilities. There doesn't seem to be a way to tell the GTiff driver to make larger requests but, from experience of other similar IO patterns moving data a few times faster than GDAL 3 seems to be able to manage, I'd guess this is probably both the easiest and largest performance limitation to remove.
The raster tile cache appears useful with band level reads and TIFF's default pixel interleave. However, GTiff sometimes compounds the small IO limitation by issuing 40 or 50% duplicate IO requests. There don't seem to be APIs to track what the cache is doing but this seems probably caused by premature eviction, suggesting it'd be advantageous if RasterIO() could just populate buffers provided by individual threads directly. Users could maybe alleviate this by upping cache allocations in GDAL config. However, doing so's disadvantageous whenever the next command needs the memory for something else.
In what I've tried so far, GTIFF_DIRECT_IO=YES has consistently profiled slower than the default of NO, has had no effect on IO request sizes, and increases DDR bandwidth demand. This happens for entire .tif reads (Dataset.ReadRaster()), whole band reads, block reads, and single row reads (all Band.ReadRaster()). This surprises me as it seems reasonable to expect bypassing the block cache to be (marginally) faster, avoid allocating duplicate memory (once for the cache and once for the destination buffer, presumably), and reduce demand for DDR bandwidth.
The idea above of multiple GDAL configuration sections for different things is interesting and potentially helpful but, as @jjimenezshaw kind of touched on, doesn't really compose with C# use cases where users expect to grab a package and have it just work. Gdal.SetConfigOption() provides something of an alternative but is awkward whenever different threads are doing different things with different files. Seems like it'd be easier if there was a way to directly configure a particular instance of the GTiff driver or IO operations on a particular Dataset.
Maybe I'm missing it, but there doesn't seem to be a way to query a TIFF's interleave or tell RasterIO() (via C# SWIG) not to deinterleave pixels. For a lot of the processing I do it'd probably be fastest (minimum memory, lowest DDR bandwidth, greatest cache locality) just to yank interleaved pixels into a buffer.

It's not entirely clear to me what GDAL expects of its callers for efficient IO but none of the numerous combinations I've tried yields uncompressed GeoTIFF read throughput much above 1 GB/s per thread (tested with a 5950X and PCIe 4.0 x4 NVMe). Multithreaded scaling could likely also be increased as DDR bandwidth demand is fairly high (7–10 GB/s per GB/s read) compared to analogous non-GDAL cases.

Also, kind of an as aside, SWIG isn't making use of .NET types such as ReadOnlySpan<T>, Span<T>, INumber<T> and ReadRaster() and WriteRaster() overloads haven't kept up with DataType. Not a big deal but, for example, it feels a little odd not to be able to do things like stackalloc double[6] for Dataset.GetGeoTransform().

mdsumner commented 1 month ago

It would be good if 'gdal_translate -projwin' didn't act like 'gdalwarp -te ' for some resample algs, applying the projwin to the output rather than aligning to input pixels. I don't get why this was mixed up, having gdal_translate always be in alignment seems like a good default and let warp do prescriptive target extent. 🙏

jratike80 commented 1 month ago

It would be good if 'gdal_translate -projwin' didn't act like 'gdalwarp -te ' for some resample algs, applying the projwin to the output rather than aligning to input pixels.

I am curious, could you show an understandable example about what is wrong, maybe with some images?

mdsumner commented 1 month ago

for sure, if you request a window that is not aligned to the source pixels, it snaps to the source when resample is 'near'

gdal_translate  vrt://gcore/data/rgbsmall.tif?a_ullr=0,50,50,0 near.vrt -projwin 5.5 15.5 10.3  0

gdalinfo near.vrt

...
Origin = (5.000000000000000,16.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
...

When you use 'bilinear', it uncouples from the source grid and gives the provided extent (like warp always does).

gdal_translate  vrt://gcore/data/rgbsmall.tif?a_ullr=0,50,50,0 bilinear.vrt -projwin 5.5 15.5 10.3  0 -r bilinear
gdalinfo bilinear.vrt

...
Origin = (5.500000000000000,15.500000000000000)
Pixel Size = (0.960000000000000,-0.968750000000000)
...

So, to get alignment you have to your own calcs and provide the right snapped projwin.

I was gobsmacked when this was pointed out to me ... but, it's in a Note here, and I understand the concern about the shift but I think it was the wrong fix: https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-projwin

rouault commented 1 month ago

Inconsistent use of String(JSON) type in the GeoJSON driver:

if there's a mix of data types, a String(JSON) field is reported to mean that.

The only annoying thing is that for backward compatibility with past behaviour of GDAL 3.5 where we silently homogenized to a string, we didn't go to the point to actually quoting strings, so this isn't fully JSON compliant unfortunately

I mean if we have out.json with:

{
"type": "FeatureCollection",
"features": [
{ "type": "Feature", "properties": { "foo": "str" }, "geometry": null },
{ "type": "Feature", "properties": { "foo": 0 }, "geometry": null },
{ "type": "Feature", "properties": { "foo": ["a", "b"] }, "geometry": null }
]
}

$ ogrinfo -al out.geojson -q

Layer name: out
OGRFeature(out):0
  foo (String(JSON)) = str

OGRFeature(out):1
  foo (String(JSON)) = 0

OGRFeature(out):2
  foo (String(JSON)) = [ "a", "b" ]

In theory, we should report "str", not just str.

mdsumner commented 1 week ago

Deprecate use of term "dateline"

OSGeo / gdal

[meta-ticket] GDAL 4.0 potential changes related to API or behaviour breaks #8440

Discussion