georust / gdal

Rust bindings for GDAL
https://crates.io/crates/gdal
MIT License
339 stars 92 forks source link

GDAL DEM Processing Routines #456

Closed metasim closed 7 months ago

metasim commented 8 months ago
metasim commented 8 months ago

@lnicola Thank you for the early feedback. Course corrections earlier rather than laters are definitely helpful.

metasim commented 8 months ago

@lnicola Two questions:

  1. Do you think this set of features should be behind a dem feature flag?
  2. Are you adverse to exposing the capability like this:

In raster/processing/dem/slope.rs


impl Dataset {
    fn slope(&self, options: &SlopeOptions) -> Result<Dataset> { ... }
    ...
}

In raster/processing/dem/aspect.rs


impl Dataset {
    fn aspect(&self, options: &AspectOptions) -> Result<Dataset> { ... }
    ...
}

etc.

I like this approach as it makes it more discoverable in IDE/rust-analyzer.

lnicola commented 8 months ago

No to both

lnicola commented 8 months ago

Sorry, I was on my phone earlier and didn't pay enough attention to the code snippets (I was looking at the paths).

  1. I don't think a feature is necessary, since it only gates some pretty thin bindings to the GDAL functionality, not the whole implementation
  2. I'm not sure :grimacing:. I disliked it initially, but perhaps it's not such a bad idea. Maybe we should ask for another opinion.
lnicola commented 8 months ago

By the way, if it's in a trait you also get completions for it. Not that I'm proposing this, just wanted to point it out.

metasim commented 8 months ago
  1. I'm not sure 😬. I disliked it initially, but perhaps it's not such a bad idea. Maybe we should ask for another opinion.

FWIW: We do have precedence for it (and I like the code-organization benefits):

❯ rg 'impl Dataset'
src/dataset.rs
37:impl Dataset {

src/vector/sql.rs
60:impl Dataset {

src/vector/layer.rs
593:impl Dataset {

src/gcp.rs
109:impl Dataset {

src/vector/transaction.rs
100:impl Dataset {

src/raster/rasterband.rs
23:impl Dataset {

src/raster/mdarray.rs
800:impl Dataset {

src/raster/processing/dem/mod.rs
54:impl Dataset {
lnicola commented 8 months ago

The organization is fine, which is why I initially said I had nothing against it.

What I'm not sure I like is exposing the algorithms as inherent methods on the datasets. Most of the stuff we have there maps directly to GDALDatasetXXX. The DEM and warper are separate in the GDAL API.

metasim commented 8 months ago

What I'm not sure I like is exposing the algorithms as inherent methods on the datasets. Most of the stuff we have there maps directly to GDALDatasetXXX. The DEM and warper are separate in the GDAL API.

Aaaaah. Got it. A nuance I didn't catch. It's hard for me to tell what in the C API design is well-considered, and what's accidental.

metasim commented 8 months ago

I flip-flopped between impl Dataset in each sub-module, vs in the dem module. Settled on the latter for coherence sake in the generated docs.

https://github.com/georust/gdal/pull/456/files#diff-c3e410b73d29819318373ad9149f478156e2060050ec127f3a234f245f862843R53-R305


Aside...

I wish you could do this in Rust:

pub trait DemRoutines where Self: Dataset { ... }
impl DemRoutines for Dataset {}

Then if you import the module, you get the methods, but not otherwise, and would have half the boilerplate (default impls in trait, not just declarations).


~I can switch this to a trait DemRoutines with a single impl for Dataset if you think the separation is important.~ Looking ahead to other processing capabilities (warp, contour, etc.), we probably want to get this pattern correct. So I think I'll see what switching to an extension trait looks like.


Update: Change to extension trait.

lnicola commented 8 months ago

This looks pretty nice, except:

metasim commented 8 months ago

DemCommonOptions::compute_edges should probably be bool, not Option; same for additional_options

The reason for compute_edges being Option is so we don't have to keep in sync with what the GDAL defaults are, should they change. IOW, If the caller doesn't explicitly set it, we do whatever the GDAl default is. It probably won't change, so I'll just set it to the documented default, but wanted to point out there was some rationale to it.

lnicola commented 8 months ago

so we don't have to keep in sync with what the GDAL defaults are

That's what I suspected, but as a user, it makes for a pretty confusing API.

DemProcessing trait

I'm really sorry for the back and forth on this, but I now think they should be free functions. Some of the reasons are:

metasim commented 8 months ago

putting them on the Dataset feels very arbitrary -- why not on Rasterband?

Because the primary input to the all the DEM functions is the Dataset.

I'll convert them to free functions, but to me it seems just as arbitrary as any other decisions to make something inherent on dataset. From where I sit, in the C API, if the first parameter is Dataset, it might as well be treated as the receiver.

We'll also loose discoverability, but maybe that's good for DEM routines. Most people won't use them.

I just wish we had a heuristic we could apply to determine objectively if something should be a free function verses a method. It seems more a matter of taste (and my preference is methods).

lnicola commented 8 months ago

The interface I see for Dataset is basically https://gdal.org/doxygen/classGDALDataset.

I've used GDAL mostly from Python, but when I'm looking for an API, most of the time my process is:

metasim commented 8 months ago

@lnicola I got rid of all the DemCommonOptions constructs. I somewhat understand your reasoning (but not really, as the user experience is the same), but need to be on the record that I'm not happy with it. Doing so added > 2000 lines of code, which seems like we're violating DRY principles in a big way. Without using traits or macros, I don't know to get rid of all the copy-and-pasting.

https://github.com/georust/gdal/pull/456/commits/ff8a2b614a0c64a2085eedf504116d8df03e44db

Question: Do I need to remove all the situations where the options structs have Option fields, and explicitly default to what the GDAL documentation says are the internal defaults?

metasim commented 8 months ago

@lnicola I believe I've applied all the changes requested here.

lnicola commented 8 months ago

Doing so added > 2000 lines of code, which seems like we're violating DRY principles in a big way. Without using traits or macros, I don't know to get rid of all the copy-and-pasting.

The third way is include!, but a declarative macro is what I would use here.

remove all the situations where the options structs

I'll have to check. Let's start with the common options (that bool and the option string list which can be empty) and iterate in another PR if really needed.

lnicola commented 8 months ago

And one final question is where to put these. They're currently under raster::processing::dem, but we also have programs::raster. I think I like programs more, since it's a little weird to have raster::processing::translate and vector::processing::vector_translate.

Another aspect is that I think the Rust guidelines encourage tall/deep module hierarchies, but with re-exports so that the user only sees shallow ones.

metasim commented 8 months ago

And one final question is where to put these. They're currently under raster::processing::dem, but we also have programs::raster. I think I like programs more, since it's a little weird to have raster::processing::translate and vector::processing::vector_translate.

The rationale for this is two fold:

  1. I plan to work on the Warp subsystem, and slotted it to live in raster::processing:warp. But that's separate from the export structure. See below.
  2. I don't think programs makes any sense in a library. If I'm looking at a crate structure, I'd expect programs be the same as bins: a place to put source for executable targets. From the outside I wouldn't know how to semantically interpret programs. Heck, even from the inside I don't know how to interpret programs. Long term, I think the contents of the programs module should be migrated to something more logical.

On your point about raster::processing vs programs::raster, I'd argue the latter breaks the convention in the main library by reversing scope. We train users to think "domain first, capability second", but with programs::raster it's "if I don't find it in raster::, I should also look in programs::raster. But because I find programs confusing, perhaps I'm missing a larger point.

I can see arguments against raster::processing, but to fix its weaknesses I think we need to think more holistically about module structure and what heuristics we use to determine the proper destination for a capability. If the answer is "do exactly what is done in C++" (which would be valid) then we have even more of a gap to address. But I think we can do better than that. To me, raster::processing is a place for advanced capabilities beyond opening and querying Datasets.

Another aspect is that I think the Rust guidelines encourage tall/deep module hierarchies, but with re-exports so that the user only sees shallow ones.

My intent was the use would use gdal::raster::processing::dem::*, since DEM stuff is kinda niche. But I see an argument that this is too deep. I don't really think it should be exported at the gdal::raster::* level. Again, programs::raster doesn't makes sense to me as "what's a 'program' here?".

How about this: we close the PR as it is (or tweak the pub use structure), and open a new ticket to discuss a design around the organization of all constituent parts?

lnicola commented 8 months ago

I don't think programs makes any sense in a library. If I'm looking at a crate structure, I'd expect programs be the same as bins: a place to put source for executable targets. From the outside I wouldn't know how to semantically interpret programs. Heck, even from the inside I don't know how to interpret programs. Long term, I think the contents of the programs module should be migrated to something more logical.

I agree it's unusual, but many of the GDAL "apps" are already exposed as a library (everything that ends up taking an argv as an input, including GDALDEMProcessing. I'm not that familiar with them, but I think they're split between gdal_alg.h and gdal_utils.h in the GDAL headers.

But because I find programs confusing, perhaps I'm missing a larger point.

My thinking is "I want gdalwarp, that's a "program". I want (to give a random example) ExecuteSQL, that's not a program". Even today, raster is a bit of a kitchen sink: https://docs.rs/gdal/latest/gdal/raster/index.html.

How about this: we close the PR as it is (or tweak the pub use structure), and open a new ticket to discuss a design around the organization of all constituent parts?

That will never get anywhere, so please no :stuck_out_tongue:.

metasim commented 8 months ago

"I want gdalwarp, that's a "program".

gdalwarp barely exposes the potential of the Warp subsystem. After reading through a lot of the GDAL Warp code over the last year, I think it's potential is barely tapped by the gdalwarp executable, and limited by the lack of examples and complex options model. It may be wishful/idealistic thinking on my part, but with Rust, we could expose a lot of the processing potential in a much less intimidating way than it it is in C++. Rust enabled pixel functions alone could be very powerful. That's my dream anyway.

metasim commented 7 months ago

bors r=lnicola