Discussion: Compression and Splines

jlaura commented 1 month ago

With #604 and #605 compression is being added for ISDs. This issue is to discuss how this might propagate across the stack.

Here are some nominal requirements:

[ ] Ability to compress/decompress ISDs (done in #604 and #605).
[ ] Ability to use the compressed ISD straight to memory (no need to decompress to disk).
[ ] Ability to write an updated compressed ISD to disk after bundle adjustment.
[ ] Ability to perform adjustment tasks at scale using the CSM. Can one load 1000 ISDs efficiently? 10,000? 100,000?
- [ ] This potentially relates to storing less ephemeris points inside the ISD.
[ ] An API for consumes of ale to use in their library for working with compressed ISDs.

Kelvinrr commented 4 weeks ago

Reading through the CSM standard, I don't see any mention of the file format. I haven't read the whole thing but here is a part of the text (CSM 3.1 TRD page 22) on ISDs:

Providing image support data to the sensor model selection and construction functions. An Isd class object is provided when processing an image from native file format (or when a sensor model state is not available). Note that the following convention should be observed by the Application when constructing Isd objects. The Application should create ISD standard forms such as NITF 2.0 or 2.1, if possible. The next preferred form is BYTESTREAM, followed by FILENAME. Some plug-ins may not support file access operations.

Page 63 continues to talk about "Filename ISD" support but says nothing about its format.

Considering file reading isn't a requirement, we might be able to get away with a novel ISD format. I wasn't involved in early convos for the ISD format and why it's JSON. But I imagine that is not enforced in the standard and we just chose one? That is to say, if we wanted to create a second compressed ISD format, it seems we could.

Kelvinrr commented 4 weeks ago

On the topics:

Ability to use the compressed ISD straight to memory (no need to decompress to disk).

So I think the way to do this is using a memory mapped file, as that would give use the fastest reading time. There exists libraries out there that can handle all the nuances with binary compatibility across OSes and architectures. I had success in SpiceQL with this to reduce kernel loading query from 20,000ms (straight JSON) to 5ms (MMAPed tables). Downside is you don't get compression, but if there exists a C++ interface to Brotli (or whatever) compression that allows us to decompress bytes in memory to avoid extra copies we could read in straight bytes and decompress to something in memory.

Theoretically, I think a novel file format that is compressed bytes -> mmaped on IO into a bytes array -> decompressed in memory, could all still be faster than straight reading of a large mmaped file 🤔 This all hinges on off the shelf libraries that supports decompressing bytes. Edit: potential options? https://github.com/NewYaroslav/brotli-hpp and https://github.com/vimpunk/mio

Ability to write an updated compressed ISD to disk after bundle adjustment. An API for consumes of ale to use in their library for working with compressed ISDs.

Whatever format we use above would have to unpack to something other than JSON (e.g. some kind of efficient hash map that is not the STL library's just because it's notoriously slow for what it is, header only implementations are out there), and maybe hide the implementation under a basic object that others could use that allows updates. Then expose that in python.

thareUSGS commented 4 weeks ago

Page 63 continues to talk about "Filename ISD" support but says nothing about its format.

There is no standard ISD format. This was done on purpose to not limit the camera type or metadata needed. That said, most Earth-centric implementations, combine the ISD and image pixels into the National Imagery Transmission Format (NITF). While the NITF was researched for the usgscsm library and planetary data, it was quickly discovered that most applications which supported the NTIF format assume an Earth-based WGS84 reference ellipsoid (and thus not used for our planetary use case). For the Earth-side and NITF, a little more information is here (including RPC support in NITF).

Kelvinrr commented 4 weeks ago

@thareUSGS I saw how the standard seemed to suggest preferring the NITF format over others, but it didn't seem to be something we would be supporting.

DOI-USGS / ale

Discussion: Compression and Splines #609