georust / gdal

Rust bindings for GDAL
https://crates.io/crates/gdal
MIT License
339 stars 92 forks source link

Provide a bundled flag for `gdal-sys` #517

Open weiznich opened 5 months ago

weiznich commented 5 months ago

This commit introduces a new gdal-src crate which bundles gdal and builds a minimal version from source via build.rs

Fixes #465

weiznich commented 5 months ago

Marked as draft as I need to perform a final set of tests on MacOS and Windows.

lnicola commented 5 months ago

This looks all right, but as I mentioned in the related issue, I'm worried it's only going to be useful to you.

GDAL is pretty large, and its users might have a lot of expectations, but a version that only supports GPKG is going to be very surprising. Even your use case might be better served by https://docs.rs/geozero.

CC @rouault long-term, how do you feel about ExternalProject (SuperBuild)?

weiznich commented 5 months ago

This looks all right, but as I mentioned in the related issue, I'm worried it's only going to be useful to you.

GDAL is pretty large, and its users might have a lot of expectations, but a version that only supports GPKG is going to be very surprising. Even your use case might be better served by https://docs.rs/geozero.

I agree with the point that users might have expectations that are not covered (yet) by the proposed bundling support, but I disagree with with the point that it only supports GPKG and therefore is not useful. See that comment for the list of supported drives. That list contains a few more useful drivers like that one for shapefiles, geojson, geotiff or kml. In addition gdal is more than just a tool that provides support for certain file formats. It also exposes other functionality that is helpful in certain situations, like support for spatial projections.

As for other drivers: It is certainly possible to enable other drivers as well. That mostly requires going through the list of supported drivers and classify them if they need external dependencies or not. The later ones can easily be enabled, while for the former ones more work is required. Depending on the required external dependency that might require adding another sys crate as dependency and enabling the bundling support there (like for example for netcdf, which already has bundling support in rust) or it might require writing that bundling support for that other sys crate first. I figured out to start with a minimal subset to have something working first.

Even your use case might be better served by https://docs.rs/geozero.

Trust me if I say that no geozero won't solve my use-case as it has to much assumptions around which geometry types exist and how they should be handled for my use-case. (It's specifically not about translating from one geometry format to another).

lnicola commented 5 months ago

Yes, I think it's all right. We should probably document our features anyway, and that's a good place where we can explain the limitations of the bundled GDAL.

weiznich commented 5 months ago

I just checked the documentation. GDAL already provides a list of dependencies for each driver:

https://gdal.org/drivers/vector/index.html https://gdal.org/drivers/raster/index.html

Anything that lists "Built-in by default" as dependency can probably just be enabled without problems.

I also know that bundling support exists for the following other dependencies:

So yes, it's probably possible to just enable most of the expected drives, although I personally would prefer having all the native dependencies behind separate feature flags.

rouault commented 5 months ago

CC @rouault long-term, how do you feel about ExternalProject (SuperBuild)?

I miss some contextual elements to understand your question

lnicola commented 5 months ago

If you're not familiar with it, ExternalProject is a CMake component that can download libraries, apply patches, then link your project with them. So it can be useful for people who want a custom build but don't want to build every dependency manually.

If GDAL adds that in the future, we can integrate with it without creating Rust libraries for that bundle the source code of the GDAL dependencies.

If it doesn't, that's fine, people can still use GeoTIFF, GPKG, SHP and a a few other formats with the approach here.

rouault commented 5 months ago

ok, I see. That's something I've considered, but I'd be hesitant to go into that business, as it equates to creating yet another packaging system, and for GDAL, with all its transitive dependencies, that can mean ~ 100 libraries for a full build... That said I do recognize that might be a recurring need for people in "unstandard" environments (Android, WASM, etc.) to have to rebuild the whole world. I'm not sure if that would belong to the OSGeo/GDAL repo itself, or if it might be a side repository where people from different teams can collaborate, and be independent of GDAL development itself. Might be worth raising the idea on gdal-dev.

lnicola commented 5 months ago

Yeah, that makes sense. You've already got a full plate with GDAL and the other libraries you're maintaining, you don't need to get into a whole new packaging system unless you really want to.

It would only make a difference for this PR if you were already planning to add it.

weiznich commented 5 months ago

I've pushed a new version of the build script that adds feature flags for a lot of the built in drivers. If I enable all of them DriverManager::all reports support for the following 162 drivers:

List with 162 driver names ``` Virtual Raster Derived datasets using VRT pixel functions GeoTIFF Cloud optimized GeoTIFF generator Erdas Imagine Images (.img) CEOS SAR Image CEOS Image JAXA PALSAR Product Reader (Level 1.1/1.5) Ground-based SAR Applications Testbed File Format (.gff) ELAS Arc/Info Binary Grid Arc/Info ASCII Grid GRASS ASCII Grid International Service for the Geoid SDTS Raster DTED Elevation Raster Portable Network Graphics JPEG JFIF In Memory Raster Japanese DEM (.mem) Graphics Interchange Format (.gif) Graphics Interchange Format (.gif) Envisat Image Format Maptech BSB Nautical Charts X11 PixMap Format MS Windows Device Independent Bitmap SPOT DIMAP AirSAR Polarimetric Image RadarSat 2 XML Product Sentinel-1 SAR SAFE Product PCIDSK Database File PCRaster Raster File ILWIS Raster Map SGI Image File Format 1.0 SRTMHGT File Format Leveller heightfield Terragen heightfield Network Common Data Format EarthWatch .TIL ERMapper .ers Labelled NOAA Polar Orbiter Level 1b Data Set FIT Image GRIdded Binary (.grb, .grb2) Raster Matrix Format OGC Web Coverage Service OGC Web Map Service EUMETSAT Archive native (.nat) Idrisi Raster A.1 Golden Software ASCII Grid (.grd) Golden Software Binary Grid (.grd) Golden Software 7 Binary Grid (.grd) COSAR Annotated Binary Matrix (TerraSAR-X) TerraSAR-X Product DRDC COASP SAR Processor Raster R Object Data Store OziExplorer .MAP Kml Super Overlay Planet Labs Mosaics API CALS (Type 1) OGC Web Map Tile Service Sentinel 2 Meta Raster Format Portable Pixmap Format (netpbm) USGS DOQ (Old Style) USGS DOQ (New Style) PCI .aux Labelled Vexcel MFF Raster Vexcel MFF2 (HKV) Raster GSC Geogrid EOSAT FAST Format VTP .bt (Binary Terrain) 1.3 Format Erdas .LAN/.GIS Convair PolGASP NLAPS Data Format Erdas Imagine Raw DIPEx FARSITE v.4 Landscape File (.lcp) NOAA Vertical Datum .GTX NADCON .los/.las Datum Grid Shift NTv2 Datum Grid Shift CTable2 Datum Grid Shift ACE2 Snow Data Assimilation System KOLOR Raw ROI_PAC raster R Raster Natural Resources Canada's Geoid NOAA GEOCON/NADCON5 .b format NSIDC Sea Ice Concentrations binary (.bin) Swedish Grid RIK (.rik) USGS Optional ASCII DEM (and CDED) GeoSoft Grid Exchange Format Bathymetry Attributed Grid S-102 Bathymetric Surface Product Hierarchical Data Format Release 5 HDF5 Dataset Northwood Numeric Grid Format .grd/.tab Northwood Classified Grid Format .grc/.tab ARC Digitized Raster Graphics Standard Raster Product (ASRP/USRP) Magellan topo (.blx) SAGA GIS Binary Grid (.sdat, .sg-grd-z) ASCII Gridded XYZ HF2/HFZ heightfield raster OziExplorer Image File USGS LULC Composite Theme Grid ZMap Plus Grid NOAA NGS Geoid Height Grids IRIS data (.PPI, .CAPPi etc) Racurs PHOTOMOD PRF Scaled Integer Gridded DEM .sigdem TGA/TARGA Image File Format OGCAPI Spatio-Temporal Asset Catalog Tiled Assets Spatio-Temporal Asset Catalog Items Geographic Network generic file based model Geographic Network generic DB based model ESRI Shapefile MapInfo File UK .NTF SDTS IHO S-57 (ENC) Microstation DGN VRT - Virtual Datasource Memory Comma Separated Value (.csv) Keyhole Markup Language (KML) GeoJSON GeoJSON Sequence ESRIJSON TopoJSON GMT ASCII Vectors (.gmt) GeoPackage SQLite / Spatialite WAsP .map format PostgreSQL/PostGIS ESRI FileGDB AutoCAD DXF AutoCAD Driver FlatGeobuf Geoconcept Czech Cadastral Exchange Data Format PostgreSQL SQL dump French EDIGEO exchange format Idrisi Vector (.vct) Elastic Search Carto AmigoCloud Storage and eXchange Format Selafin VDV-451/VDV-452/INTREST Data Format NextGIS Web MapML General Transit Feed Specification OGC Features and Geometries JSON U.S. Census TIGER/Line Arc/Info Binary Coverage Arc/Info E00 (ASCII) Coverage Generic Binary (.hdr Labelled) ENVI .hdr Labelled ESRI .hdr Labelled ISCE raster ```

Notably this is still missing support for anything that depends on libexpat or libmysqlclient and likely a few other more specific dependencies. For libexpat there does not seem to be a crate with up to date rust bindings at all, the libmysqlclient-sys crate still misses bundling support. (With my diesel hat on, the last thing is something I want to address at some point).

It's up to discussion which of these drivers should be enabled by default.

weiznich commented 5 months ago

I confirmed that this works on linux (x86_86), windows (msvc, x86_64) and macos (aarch64).

weiznich commented 5 months ago

Points that should be likely discussed (and addressed) before merging:

Especially the later 3 points are things that I believe are project specific so I would like get some guidance from the maintainers on that.

weiznich commented 3 months ago

Is there any interest in moving this PR forward from the relevant maintainers? I'm still waiting on a full review

lnicola commented 3 months ago

Yeah, I even realize you've added features for the default drivers. I want to take a better look, but it would be very nice to have this, even with the format limitations.

lnicola commented 3 months ago

r? @metasim, r? @jdroenner

metasim commented 3 months ago

PS: We'll need documentation on how to update the git submodule whenever a new point release of GDAL appears.

weiznich commented 3 months ago

Thanks for the reviews.

a CI job for the bundled GDAL, with no and with all drivers enabled

Which platforms should be covered by that job? All that are supported by github actions, so windows, ubuntu, mac os (x64) and mac os (aarch64)?

some instructions for upgrading gdal-src to a newer version of GDAL

Assuming that there are no fundamental changes to the gdal build system and that they did not add new dependencies it should be as simple as:

cd gdal/gdal-src/source
git fetch origin
git checkout v{whatever-is-the-latest-gdal-version}
cd ../..
git add gdal-src/source
git commit -m "Update gdal to the {whatver-is-the-latest-gdal-version}

I'm happy to write that down somewhere in the repository. Should I just put a Upgrading.md into the gdal-src folder, or do you have any other preferences for this?

I'm sympathetic to the need for a fully statically linked version of GDAL, but gosh, this feels like a lot of new maintenance for a small team of maintainers to take on, especially without additional documentation on the design decisions made, liabilities, assumptions, etc. Maybe I'm being too quick to react, or being myopic in my assessment, but it feels like a significant amount of complexity for an unquantified subset of GDAL users who need it.

It's quite hard to me to understand on which topic you would like to have additional documentation. The only actively made design decision I made is that different drivers can be enabled via feature flags. There are certainly other solutions there, I just went with the first thing that came into my mind. Otherwise, at least I feel that most of the code is actually a straight forward rust port of what you would do to build gdal locally from source + the required rust linker flags for linking gdal and all required dependencies.

So I must ask: Will the PR author be committed to support this capability moving forward for a period of time, or is it a "throw it over the fence" type of PR?

Given that I have more than enough work on my table with diesel, I cannot promise any sort of active maintenance. Given that I plan to actively use at least a subset of this feature (only certain drivers), I will likely submit a PR from time to time to update to a new version (as required by my needs) or to bugs that I encounter.

Maybe I'd feel better about it if we compared this against other crates "vendoring" for the purposes of static compilation, to ensure ourselves this is the optimal build architecture.

See the following examples:

Is gdal-src/source a git submodule? Will all users end up triggering a clone of GDAL whether or not the users need a statically linked version? I know from experience that git submodules cause a lot of people problems because they are not widely understood, and we don't want to add any additional friction to new users of the crate.

Yes it's a submodule. I really depends on what you understand by "user" here.

If that refers to someone that gets gdal-src from crates.io and builds it locally: No they wouldn't need to care about the submodule at all. cargo publish puts the source code into an archive which is hosted at crates.io. It doesn't interact with your git repository at all for using a published version. (As side note: The current setup would include all of the gdal source code into the gdal-src package. By using the include/exclude Cargo.toml fields, it would likely be possible to reduce the package size quite a bit.

If "user" refers to someone that uses this git repository as direct dependency (gdal = { git = "https://github.com/georust/gdal"} ) via cargo: They don't need to handle the subrepository. Cargo does automatically handle that for you. (They will need to wait some time for the download to finish, as the gdal repo is quite large)

If "user" refers to a potential contributor: Yes, they might need to care about the submodule, but only if the actually want to build gdal-src locally. If they contribute to a different part of the code base it might be fine.

lnicola commented 3 months ago

Which platforms should be covered by that job? All that are supported by github actions, so windows, ubuntu, mac os (x64) and mac os (aarch64)?

Linux x64 since that's what we're testing today. Windows might be nice, but there are already 4 different old PRs for that, so we probably shouldn't block on it.

Should I just put a Upgrading.md into the gdal-src folder, or do you have any other preferences for this?

I think that's usually for telling users how to upgrade. Let's call it DEVELOPMENT.md or something like that and put it in the repo root. We (I) also need to document how to generate new bindings, so that will be a good place for it.

it should be as simple as [...]

LGTM, but let's have it written down.

if we compared this against other crates "vendoring"

I'm comfortable with the repo structure, less so with the build.rs changes. But I don't think there's any way to simplify it.

weiznich commented 3 months ago

Linux x64 since that's what we're testing today. Windows might be nice, but there are already 4 different old PRs for that, so we probably shouldn't block on it.

I've added a CI job for linux (x64), windows (x64) and macos (x64 and aarch64) for the bundled build. It takes quite a bit of time to build everything there but it seems to work. These jobs will try to build with the minimal feature set enabled first and then run tests with all drivers enabled. Running tests with the minimal feature set enabled seems to be not possible as that is missing quite a lot of drivers to execute all tests.

weiznich commented 3 months ago

It might sometimes be not possible to statically link both libpq and libnetcdf because both include a vendored copy of strlcat. There is the following upstream issue for this https://github.com/Unidata/netcdf-c/issues/927

metasim commented 3 months ago

@lnicola Before this PR is merged, could we consider cutting a 0.17 Release?. I say we go with what we have. (I'm not going to have time to wrap up #508 before then. :/) It's been in the wild for a while, and it would be nice to close that chapter before a big change like this one.