Open weiznich opened 5 months ago
Marked as draft as I need to perform a final set of tests on MacOS and Windows.
This looks all right, but as I mentioned in the related issue, I'm worried it's only going to be useful to you.
GDAL is pretty large, and its users might have a lot of expectations, but a version that only supports GPKG is going to be very surprising. Even your use case might be better served by https://docs.rs/geozero.
CC @rouault long-term, how do you feel about ExternalProject (SuperBuild)?
This looks all right, but as I mentioned in the related issue, I'm worried it's only going to be useful to you.
GDAL is pretty large, and its users might have a lot of expectations, but a version that only supports GPKG is going to be very surprising. Even your use case might be better served by https://docs.rs/geozero.
I agree with the point that users might have expectations that are not covered (yet) by the proposed bundling support, but I disagree with with the point that it only supports GPKG
and therefore is not useful. See that comment for the list of supported drives. That list contains a few more useful drivers like that one for shapefiles, geojson, geotiff or kml. In addition gdal is more than just a tool that provides support for certain file formats. It also exposes other functionality that is helpful in certain situations, like support for spatial projections.
As for other drivers: It is certainly possible to enable other drivers as well. That mostly requires going through the list of supported drivers and classify them if they need external dependencies or not. The later ones can easily be enabled, while for the former ones more work is required. Depending on the required external dependency that might require adding another sys crate as dependency and enabling the bundling support there (like for example for netcdf
, which already has bundling support in rust) or it might require writing that bundling support for that other sys crate first. I figured out to start with a minimal subset to have something working first.
Even your use case might be better served by https://docs.rs/geozero.
Trust me if I say that no geozero won't solve my use-case as it has to much assumptions around which geometry types exist and how they should be handled for my use-case. (It's specifically not about translating from one geometry format to another).
Yes, I think it's all right. We should probably document our features anyway, and that's a good place where we can explain the limitations of the bundled GDAL.
I just checked the documentation. GDAL already provides a list of dependencies for each driver:
https://gdal.org/drivers/vector/index.html https://gdal.org/drivers/raster/index.html
Anything that lists "Built-in by default" as dependency can probably just be enabled without problems.
I also know that bundling support exists for the following other dependencies:
So yes, it's probably possible to just enable most of the expected drives, although I personally would prefer having all the native dependencies behind separate feature flags.
CC @rouault long-term, how do you feel about ExternalProject (SuperBuild)?
I miss some contextual elements to understand your question
If you're not familiar with it, ExternalProject is a CMake component that can download libraries, apply patches, then link your project with them. So it can be useful for people who want a custom build but don't want to build every dependency manually.
If GDAL adds that in the future, we can integrate with it without creating Rust libraries for that bundle the source code of the GDAL dependencies.
If it doesn't, that's fine, people can still use GeoTIFF, GPKG, SHP and a a few other formats with the approach here.
ok, I see. That's something I've considered, but I'd be hesitant to go into that business, as it equates to creating yet another packaging system, and for GDAL, with all its transitive dependencies, that can mean ~ 100 libraries for a full build... That said I do recognize that might be a recurring need for people in "unstandard" environments (Android, WASM, etc.) to have to rebuild the whole world. I'm not sure if that would belong to the OSGeo/GDAL repo itself, or if it might be a side repository where people from different teams can collaborate, and be independent of GDAL development itself. Might be worth raising the idea on gdal-dev.
Yeah, that makes sense. You've already got a full plate with GDAL and the other libraries you're maintaining, you don't need to get into a whole new packaging system unless you really want to.
It would only make a difference for this PR if you were already planning to add it.
I've pushed a new version of the build script that adds feature flags for a lot of the built in drivers. If I enable all of them DriverManager::all
reports support for the following 162 drivers:
Notably this is still missing support for anything that depends on libexpat
or libmysqlclient
and likely a few other more specific dependencies. For libexpat
there does not seem to be a crate with up to date rust bindings at all, the libmysqlclient-sys
crate still misses bundling support. (With my diesel hat on, the last thing is something I want to address at some point).
It's up to discussion which of these drivers should be enabled by default.
I confirmed that this works on linux (x86_86), windows (msvc, x86_64) and macos (aarch64).
Points that should be likely discussed (and addressed) before merging:
Especially the later 3 points are things that I believe are project specific so I would like get some guidance from the maintainers on that.
Is there any interest in moving this PR forward from the relevant maintainers? I'm still waiting on a full review
Yeah, I even realize you've added features for the default drivers. I want to take a better look, but it would be very nice to have this, even with the format limitations.
r? @metasim, r? @jdroenner
PS: We'll need documentation on how to update the git submodule whenever a new point release of GDAL appears.
Thanks for the reviews.
a CI job for the bundled GDAL, with no and with all drivers enabled
Which platforms should be covered by that job? All that are supported by github actions, so windows, ubuntu, mac os (x64) and mac os (aarch64)?
some instructions for upgrading gdal-src to a newer version of GDAL
Assuming that there are no fundamental changes to the gdal build system and that they did not add new dependencies it should be as simple as:
cd gdal/gdal-src/source
git fetch origin
git checkout v{whatever-is-the-latest-gdal-version}
cd ../..
git add gdal-src/source
git commit -m "Update gdal to the {whatver-is-the-latest-gdal-version}
I'm happy to write that down somewhere in the repository. Should I just put a Upgrading.md
into the gdal-src
folder, or do you have any other preferences for this?
I'm sympathetic to the need for a fully statically linked version of GDAL, but gosh, this feels like a lot of new maintenance for a small team of maintainers to take on, especially without additional documentation on the design decisions made, liabilities, assumptions, etc. Maybe I'm being too quick to react, or being myopic in my assessment, but it feels like a significant amount of complexity for an unquantified subset of GDAL users who need it.
It's quite hard to me to understand on which topic you would like to have additional documentation. The only actively made design decision I made is that different drivers can be enabled via feature flags. There are certainly other solutions there, I just went with the first thing that came into my mind. Otherwise, at least I feel that most of the code is actually a straight forward rust port of what you would do to build gdal locally from source + the required rust linker flags for linking gdal and all required dependencies.
So I must ask: Will the PR author be committed to support this capability moving forward for a period of time, or is it a "throw it over the fence" type of PR?
Given that I have more than enough work on my table with diesel, I cannot promise any sort of active maintenance. Given that I plan to actively use at least a subset of this feature (only certain drivers), I will likely submit a PR from time to time to update to a new version (as required by my needs) or to bugs that I encounter.
Maybe I'd feel better about it if we compared this against other crates "vendoring" for the purposes of static compilation, to ensure ourselves this is the optimal build architecture.
See the following examples:
Is gdal-src/source a git submodule? Will all users end up triggering a clone of GDAL whether or not the users need a statically linked version? I know from experience that git submodules cause a lot of people problems because they are not widely understood, and we don't want to add any additional friction to new users of the crate.
Yes it's a submodule. I really depends on what you understand by "user" here.
If that refers to someone that gets gdal-src
from crates.io and builds it locally: No they wouldn't need to care about the submodule at all. cargo publish
puts the source code into an archive which is hosted at crates.io
. It doesn't interact with your git repository at all for using a published version. (As side note: The current setup would include all of the gdal
source code into the gdal-src
package. By using the include
/exclude
Cargo.toml
fields, it would likely be possible to reduce the package size quite a bit.
If "user" refers to someone that uses this git repository as direct dependency (gdal = { git = "https://github.com/georust/gdal"}
) via cargo: They don't need to handle the subrepository. Cargo does automatically handle that for you. (They will need to wait some time for the download to finish, as the gdal
repo is quite large)
If "user" refers to a potential contributor: Yes, they might need to care about the submodule, but only if the actually want to build gdal-src
locally. If they contribute to a different part of the code base it might be fine.
Which platforms should be covered by that job? All that are supported by github actions, so windows, ubuntu, mac os (x64) and mac os (aarch64)?
Linux x64 since that's what we're testing today. Windows might be nice, but there are already 4 different old PRs for that, so we probably shouldn't block on it.
Should I just put a Upgrading.md into the gdal-src folder, or do you have any other preferences for this?
I think that's usually for telling users how to upgrade. Let's call it DEVELOPMENT.md
or something like that and put it in the repo root. We (I) also need to document how to generate new bindings, so that will be a good place for it.
it should be as simple as [...]
LGTM, but let's have it written down.
if we compared this against other crates "vendoring"
I'm comfortable with the repo structure, less so with the build.rs
changes. But I don't think there's any way to simplify it.
Linux x64 since that's what we're testing today. Windows might be nice, but there are already 4 different old PRs for that, so we probably shouldn't block on it.
I've added a CI job for linux (x64), windows (x64) and macos (x64 and aarch64) for the bundled build. It takes quite a bit of time to build everything there but it seems to work. These jobs will try to build with the minimal feature set enabled first and then run tests with all drivers enabled. Running tests with the minimal feature set enabled seems to be not possible as that is missing quite a lot of drivers to execute all tests.
It might sometimes be not possible to statically link both libpq and libnetcdf because both include a vendored copy of strlcat
. There is the following upstream issue for this https://github.com/Unidata/netcdf-c/issues/927
@lnicola Before this PR is merged, could we consider cutting a 0.17 Release?. I say we go with what we have. (I'm not going to have time to wrap up #508 before then. :/) It's been in the wild for a while, and it would be nice to close that chapter before a big change like this one.
This commit introduces a new
gdal-src
crate which bundles gdal and builds a minimal version from source viabuild.rs
Fixes #465
[x] I added an entry to
CHANGES.md
if knowledge of this change could be valuable to users.