busstoptaktik / geodesy

Rust geodesy
Apache License 2.0
63 stars 6 forks source link

Support Geodetic TIFF grids #86

Open Rennzie opened 7 months ago

Rennzie commented 7 months ago

As a question. Should we consider supporting TIFF grids alongside Gravsoft and Ntv2?

busstoptaktik commented 7 months ago

That would be awesome, @Rennzie - I just never imagined it realistical.

GTG is an utterly sensible design (although I think an adjustable and smaller block size than 256x256 would be meaningful). And being able to work seamlessly with PROJ's grid distribution would make so much sense.

But I seriously believe it will be a huge undertaking to bring it to life in full. Of course, ignoring the blocking and caching (which essentially builds a tiff-grid-specific virtual memory subsystem) and just read an entire file at a time, will make it much easier, but then the GTG virtues are gone.

I once had an idea of just plopping all grids into one continuous file, with metadata in a sidecar file, then read-only memory mapping the continuous file, and just reading along as if it was a huge array. That way, we make the OS handle the page swapping, which it can probably do better than a user space process. I did some implementation attempts, but other parts of RG were not sufficiently mature that I could imagine the most reasonable implementation, so after some false starts, I ripped it out again.

(I obviously realize that this approach would probably not work in a Wasm setting)

Rennzie commented 7 months ago

Apologies for the delay, I've been traveling. But It's given me some time to read the full RFC. It does sounds like a big piece of work but would be awesome to attempt.

Given your past attempts do you think it would be feasible to first build a full file reading version and then add the COG support afterwards? I'm imagining a TiffGrid (or GeoTiffGrid) struct that handles header reading and implements the Grid. We could separate header and grid reading in such a way that makes it easier to incrementally add tiles to the grid later down the line. There is a lot of surface area to cover, but if we broke it down bit by bit we could have something workable in the short term that we build on over time? WDYT?

Rennzie commented 7 months ago

The image-tiff repo seems to be the most active tiff library I can find.

busstoptaktik commented 7 months ago

WDYT?

I think the idea is awesome, but the undertaking massive - so perhaps let it live inside Geodesy (perhaps behind a feature flag), while it (and its API) matures, and then extracting it as a separately maintained library, because it's generally useful, down the road? (it would probably fit very well under the Georust umbrella).

Having it start up internally to Geodesy lets us avoid too many needless API considerations, and concentrate on functionality up front, and making (internal) API iterations costless.

Rennzie commented 7 months ago

Great, that sounds sensible. Shall we keep this issue as a way to discuss how we evolve the API?

In the mean time I'm going to try read in a simple file and grab its grid as a way to explore and learn about the format. Do you have any guidance on crates that could be of use? The one I linked above seems the most promising but you said you'd attempted this before so was curious to know what you'd tried.

busstoptaktik commented 7 months ago

Once you have a full grid, I think the GeoRust geotiff will do most of the work. The "download and cache a tile at a time" stuff is probably much harder...

busstoptaktik commented 7 months ago

but you said you'd attempted this before so was curious to know what you'd tried

I didn't try anything like this: My thought was that tile-splitting all grids and storing them in one gargantuan read-only-mmap'ed file would give the OS a chance to do what it's best at. The annoying thing about the tile-caching GTG is that essentially it introduces a user-space memory management system.

Fundamentally, the idea is right. But it would require less code if just behaving "as if" everything was in one huge continuous slab of memory, and let the OS handle the swapping disk-memory-level1/2/3-cache. That was why I was thinking of the resource file style jamming-everything-together-and-maintain-an-index-over-file-starts.

pka commented 7 months ago

I don't know how grids are stored in GeoTIFFs, but generally image-tiff does support far more tiff variants than georust/geotiff or other geotiff implementations. So I would suggest to implement geo-specific stuff on top of image-tiff, like I did in https://github.com/pka/georaster

Rennzie commented 7 months ago

Hit the first blocker. The GTG spec uses the GDAL_Metadata tag which isn't implemented by image-tiff by the looks of things. @pka did you need have any need for that tag in georaster?

pka commented 7 months ago

georaster doens't read this tag yet: https://github.com/pka/georaster/blob/main/src/geotiff.rs#L87-L94

PRs to image-tiff are welcome in my perception, releases are very infrequent, though.

busstoptaktik commented 7 months ago

There are some fairly strict rules for GTG on the PROJ CDN:

It does, however, support subdatasets, so your NTv2 infrastructure will be handy, @Rennzie

EDIT: I now see that Georaster supports subdatasets as well. So do we actually need anything but georaster? Perhaps better, @pka, to integrate @Rennzie's NTv2 support into georaster - it fits well under the "Rust library for accessing geospatial raster images" motto

Otherwise, for rudimentary support, I do not think we need to read anything but the Origin and Pixel Size elements

EDIT: And Size, obviously

Rennzie commented 6 months ago

Apologies for the lengthy silence. I'm travelling and on holiday until mid Jan so will only get a chance to get going again then.

busstoptaktik commented 3 months ago

It seems @weiji14 is making huge progress on this over at cog3pio

weiji14 commented 3 months ago

Hi :wave:! Yes I'm building another GeoTIFF reader on top of the image-tiff crate, with the array data parsed into ndarray. I'm at a stage where I'd like to upstream the core parts to georust/geotiff (see https://github.com/georust/geotiff/issues/7), maybe combine efforts with @pka's georaster crate.

Otherwise, for rudimentary support, I do not think we need to read anything but the Origin and Pixel Size elements

EDIT: And Size, obviously

I've implemented this recently in https://github.com/weiji14/cog3pio/pull/8 by storing the origin and pixel size in an AffineTransform struct. I know georaster has implemented this via some getter methods at https://github.com/pka/georaster/blob/191ebb6e89912688c7b0ef6264ad197d9dc45e4b/src/geotiff.rs#L140-L152

The GTG spec uses the GDAL_Metadata tag which isn't implemented by image-tiff by the looks of things.

It's relatively straightforward to add a tag to image-tiff, see e.g. https://github.com/image-rs/image-tiff/commit/0a4b51055e6d9357eedee1c62abdfece7fcee69d where the GdalNodata tag was added with a one-line change.

pka commented 3 months ago

:+1:

maybe combine efforts with @pka's georaster crate.

I'm lurking and open for any form of collaboration...

busstoptaktik commented 3 months ago

@weiji14 said:

Yes I'm building another GeoTIFF reader on top of the image-tiff

That sounds awesome! I took a look at image-tiff recently, and it didn't seem to support in any straightforward way, the 2 channel files which provide the bread-and-butter for geodetic datum shifts, so if you are getting that working, image-tiff/cog2pio will really be much more useful for geodesy (and Geodesy).

It would be ideal to be able to use PROJ's infrastructure for CDN delivery of geodetic tiff grids, using whichever combination of georaster and cog2pio you and @pka may be able to conjure up for public consumption :-)

The vista of referring all tiff access from Geodesy to an external dependency, letting Geodesy concentrate on the geodetic intricacies (which are "tuff enuff"), and leave the file format intricacies to a dedicated library seems like the better way, so I'm sure Geodesy will be an eager early adopter once you signal "ready for testing".

busstoptaktik commented 3 months ago

@Rennzie as you're the thread starter, I believe you'll get notifications even if not mentioned directly, but a @mention here just to be sure

weiji14 commented 3 months ago

took a look at image-tiff recently, and it didn't seem to support in any straightforward way, the 2 channel files which provide the bread-and-butter for geodetic datum shifts, so if you are getting that working, image-tiff/cog2pio will really be much more useful for geodesy (and Geodesy).

Actually, my PR to support reading multi-band GeoTIFFs in image-tiff got merged last month - https://github.com/image-rs/image-tiff/pull/224 (though they haven't made a release yet). I've included this in https://github.com/weiji14/cog3pio/pull/13, though note that cog3pio only works with float32 dtype right now, I still need to get my head around generic types (help is welcome though).

busstoptaktik commented 3 months ago

note that cog3pio only works with float32 dtype

That's totally fine: All classical geodetic correction grid formats are f32 anyway (giving corrections in seconds of arc (i.e. 30 m), an f32 resolution of 1e-7 is at the micrometer scale, i.e. much higher resolution than the correction estimation accuracy.

f64 may be necessary if one wants to implement projections directly in the grids: Here the corrections will reach megameter scale, hence only leaving decimeter resolution for the ulp in the f32 case

Rennzie commented 3 months ago

Thanks for the prompt @busstoptaktik. This all sounds really exciting. I'm unfortunately not able to spend much time on it at the moment (due to huge upheaval at work), but I'm keen to see it get going.

@pka and @weiji14 what are your feelings about combining your efforts into georaster vs georust/geotiff? Or would georaster depend on georust/geotiff in the future?

EDIT: I now see that Georaster supports subdatasets as well. So do we actually need anything but georaster? Perhaps better, @pka, to integrate @Rennzie's NTv2 support into georaster - it fits well under the "Rust library for accessing geospatial raster images" motto

I guess this depends if we want to combine everything into georust/geotiff or not. If we like that option best then perhaps the right thing to do is create a georust/ntv2 crate? Personally I like the idea of georust being the convergence point for our efforts.

Lastly, my long term aim is for geodesy-wasm to support reading GTGs so being able to download them from a URL (via COG [preferable] or in full) is a key implementation point.

weiji14 commented 2 months ago

@pka and @weiji14 what are your feelings about combining your efforts into georaster vs georust/geotiff? Or would georaster depend on georust/geotiff in the future?

EDIT: I now see that Georaster supports subdatasets as well. So do we actually need anything but georaster? Perhaps better, @pka, to integrate @Rennzie's NTv2 support into georaster - it fits well under the "Rust library for accessing geospatial raster images" motto

I guess this depends if we want to combine everything into georust/geotiff or not. If we like that option best then perhaps the right thing to do is create a georust/ntv2 crate? Personally I like the idea of georust being the convergence point for our efforts.

I'd prefer putting in effort into georust/geotiff and possibly create georust/ntv2, and then potentially we (or @pka) can decide if georaster will act as the 'high-level' library to wrap different image types, much like how image-rs/image wraps tiff, png and other formats now. Though this does sound like we are re-inventing GDAL somewhat :laughing:

pka commented 2 months ago

@pka and @weiji14 what are your feelings about combining your efforts into georaster vs georust/geotiff? Or would georaster depend on georust/geotiff in the future?

The reason for creating georaster was reading DTMs from Cloud Optimized GeoTIFFs but also PNGs. For the former, partial read access is essential, for other formats it's not so important, because they are usually served as tiles. So it's difficult to answer the question. I have no immediate plans with georaster and I'm fine if its functionality goes into a new geotiff library. Since PNG support isn't implemented yet, there is not much lost.