Open Rennzie opened 11 months ago
That would be awesome, @Rennzie - I just never imagined it realistical.
GTG is an utterly sensible design (although I think an adjustable and smaller block size than 256x256 would be meaningful). And being able to work seamlessly with PROJ's grid distribution would make so much sense.
But I seriously believe it will be a huge undertaking to bring it to life in full. Of course, ignoring the blocking and caching (which essentially builds a tiff-grid-specific virtual memory subsystem) and just read an entire file at a time, will make it much easier, but then the GTG virtues are gone.
I once had an idea of just plopping all grids into one continuous file, with metadata in a sidecar file, then read-only memory mapping the continuous file, and just reading along as if it was a huge array. That way, we make the OS handle the page swapping, which it can probably do better than a user space process. I did some implementation attempts, but other parts of RG were not sufficiently mature that I could imagine the most reasonable implementation, so after some false starts, I ripped it out again.
(I obviously realize that this approach would probably not work in a Wasm setting)
Apologies for the delay, I've been traveling. But It's given me some time to read the full RFC. It does sounds like a big piece of work but would be awesome to attempt.
Given your past attempts do you think it would be feasible to first build a full file reading version and then add the COG support afterwards? I'm imagining a TiffGrid
(or GeoTiffGrid
) struct that handles header reading and implements the Grid
. We could separate header and grid reading in such a way that makes it easier to incrementally add tiles to the grid later down the line. There is a lot of surface area to cover, but if we broke it down bit by bit we could have something workable in the short term that we build on over time? WDYT?
The image-tiff repo seems to be the most active tiff library I can find.
WDYT?
I think the idea is awesome, but the undertaking massive - so perhaps let it live inside Geodesy (perhaps behind a feature flag), while it (and its API) matures, and then extracting it as a separately maintained library, because it's generally useful, down the road? (it would probably fit very well under the Georust umbrella).
Having it start up internally to Geodesy lets us avoid too many needless API considerations, and concentrate on functionality up front, and making (internal) API iterations costless.
Great, that sounds sensible. Shall we keep this issue as a way to discuss how we evolve the API?
In the mean time I'm going to try read in a simple file and grab its grid as a way to explore and learn about the format. Do you have any guidance on crates that could be of use? The one I linked above seems the most promising but you said you'd attempted this before so was curious to know what you'd tried.
Once you have a full grid, I think the GeoRust geotiff will do most of the work. The "download and cache a tile at a time" stuff is probably much harder...
but you said you'd attempted this before so was curious to know what you'd tried
I didn't try anything like this: My thought was that tile-splitting all grids and storing them in one gargantuan read-only-mmap'ed file would give the OS a chance to do what it's best at. The annoying thing about the tile-caching GTG is that essentially it introduces a user-space memory management system.
Fundamentally, the idea is right. But it would require less code if just behaving "as if" everything was in one huge continuous slab of memory, and let the OS handle the swapping disk-memory-level1/2/3-cache. That was why I was thinking of the resource file style jamming-everything-together-and-maintain-an-index-over-file-starts.
I don't know how grids are stored in GeoTIFFs, but generally image-tiff does support far more tiff variants than georust/geotiff or other geotiff implementations. So I would suggest to implement geo-specific stuff on top of image-tiff, like I did in https://github.com/pka/georaster
Hit the first blocker. The GTG spec uses the GDAL_Metadata tag which isn't implemented by image-tiff
by the looks of things. @pka did you need have any need for that tag in georaster
?
georaster doens't read this tag yet: https://github.com/pka/georaster/blob/main/src/geotiff.rs#L87-L94
PRs to image-tiff
are welcome in my perception, releases are very infrequent, though.
There are some fairly strict rules for GTG on the PROJ CDN:
It does, however, support subdatasets, so your NTv2 infrastructure will be handy, @Rennzie
EDIT: I now see that Georaster supports subdatasets as well. So do we actually need anything but georaster? Perhaps better, @pka, to integrate @Rennzie's NTv2 support into georaster - it fits well under the "Rust library for accessing geospatial raster images" motto
Otherwise, for rudimentary support, I do not think we need to read anything but the Origin
and Pixel Size
elements
EDIT: And Size
, obviously
Apologies for the lengthy silence. I'm travelling and on holiday until mid Jan so will only get a chance to get going again then.
It seems @weiji14 is making huge progress on this over at cog3pio
Hi :wave:! Yes I'm building another GeoTIFF reader on top of the image-tiff crate, with the array data parsed into ndarray
. I'm at a stage where I'd like to upstream the core parts to georust/geotiff
(see https://github.com/georust/geotiff/issues/7), maybe combine efforts with @pka's georaster crate.
Otherwise, for rudimentary support, I do not think we need to read anything but the
Origin
andPixel Size
elementsEDIT: And
Size
, obviously
I've implemented this recently in https://github.com/weiji14/cog3pio/pull/8 by storing the origin and pixel size in an AffineTransform struct. I know georaster has implemented this via some getter methods at https://github.com/pka/georaster/blob/191ebb6e89912688c7b0ef6264ad197d9dc45e4b/src/geotiff.rs#L140-L152
The GTG spec uses the GDAL_Metadata tag which isn't implemented by
image-tiff
by the looks of things.
It's relatively straightforward to add a tag to image-tiff
, see e.g. https://github.com/image-rs/image-tiff/commit/0a4b51055e6d9357eedee1c62abdfece7fcee69d where the GdalNodata
tag was added with a one-line change.
:+1:
maybe combine efforts with @pka's georaster crate.
I'm lurking and open for any form of collaboration...
@weiji14 said:
Yes I'm building another GeoTIFF reader on top of the image-tiff
That sounds awesome! I took a look at image-tiff recently, and it didn't seem to support in any straightforward way, the 2 channel files which provide the bread-and-butter for geodetic datum shifts, so if you are getting that working, image-tiff/cog2pio will really be much more useful for geodesy (and Geodesy).
It would be ideal to be able to use PROJ's infrastructure for CDN delivery of geodetic tiff grids, using whichever combination of georaster and cog2pio you and @pka may be able to conjure up for public consumption :-)
The vista of referring all tiff access from Geodesy to an external dependency, letting Geodesy concentrate on the geodetic intricacies (which are "tuff enuff"), and leave the file format intricacies to a dedicated library seems like the better way, so I'm sure Geodesy will be an eager early adopter once you signal "ready for testing".
@Rennzie as you're the thread starter, I believe you'll get notifications even if not mentioned directly, but a @mention here just to be sure
took a look at image-tiff recently, and it didn't seem to support in any straightforward way, the 2 channel files which provide the bread-and-butter for geodetic datum shifts, so if you are getting that working, image-tiff/cog2pio will really be much more useful for geodesy (and Geodesy).
Actually, my PR to support reading multi-band GeoTIFFs in image-tiff got merged last month - https://github.com/image-rs/image-tiff/pull/224 (though they haven't made a release yet). I've included this in https://github.com/weiji14/cog3pio/pull/13, though note that cog3pio only works with float32 dtype right now, I still need to get my head around generic types (help is welcome though).
note that cog3pio only works with float32 dtype
That's totally fine: All classical geodetic correction grid formats are f32 anyway (giving corrections in seconds of arc (i.e. 30 m), an f32 resolution of 1e-7 is at the micrometer scale, i.e. much higher resolution than the correction estimation accuracy.
f64 may be necessary if one wants to implement projections directly in the grids: Here the corrections will reach megameter scale, hence only leaving decimeter resolution for the ulp in the f32 case
Thanks for the prompt @busstoptaktik. This all sounds really exciting. I'm unfortunately not able to spend much time on it at the moment (due to huge upheaval at work), but I'm keen to see it get going.
@pka and @weiji14 what are your feelings about combining your efforts into georaster
vs georust/geotiff
? Or would georaster
depend on georust/geotiff
in the future?
EDIT: I now see that Georaster supports subdatasets as well. So do we actually need anything but georaster? Perhaps better, @pka, to integrate @Rennzie's NTv2 support into georaster - it fits well under the "Rust library for accessing geospatial raster images" motto
I guess this depends if we want to combine everything into georust/geotiff
or not. If we like that option best then perhaps the right thing to do is create a georust/ntv2
crate? Personally I like the idea of georust
being the convergence point for our efforts.
Lastly, my long term aim is for geodesy-wasm
to support reading GTGs so being able to download them from a URL (via COG [preferable] or in full) is a key implementation point.
@pka and @weiji14 what are your feelings about combining your efforts into
georaster
vsgeorust/geotiff
? Or wouldgeoraster
depend ongeorust/geotiff
in the future?EDIT: I now see that Georaster supports subdatasets as well. So do we actually need anything but georaster? Perhaps better, @pka, to integrate @Rennzie's NTv2 support into georaster - it fits well under the "Rust library for accessing geospatial raster images" motto
I guess this depends if we want to combine everything into
georust/geotiff
or not. If we like that option best then perhaps the right thing to do is create ageorust/ntv2
crate? Personally I like the idea ofgeorust
being the convergence point for our efforts.
I'd prefer putting in effort into georust/geotiff
and possibly create georust/ntv2
, and then potentially we (or @pka) can decide if georaster
will act as the 'high-level' library to wrap different image types, much like how image-rs/image
wraps tiff
, png
and other formats now. Though this does sound like we are re-inventing GDAL somewhat :laughing:
@pka and @weiji14 what are your feelings about combining your efforts into
georaster
vsgeorust/geotiff
? Or wouldgeoraster
depend ongeorust/geotiff
in the future?
The reason for creating georaster
was reading DTMs from Cloud Optimized GeoTIFFs but also PNGs. For the former, partial read access is essential, for other formats it's not so important, because they are usually served as tiles.
So it's difficult to answer the question. I have no immediate plans with georaster and I'm fine if its functionality goes into a new geotiff library. Since PNG support isn't implemented yet, there is not much lost.
As a question. Should we consider supporting TIFF grids alongside Gravsoft and Ntv2?