Caltech-IPAC / kete

Kete Solar System Survey tools
https://Caltech-IPAC.github.io/kete
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

Handling File Formats #139

Open dahlend opened 3 days ago

dahlend commented 3 days ago

This is a summary of some of my thoughts about saving files, mostly to help organize my thoughts on paper.

I am not considering any changes to kete for this at the moment, merely "thinking out loud", as the current implementation is something I am not completely happy with.

Currently kete uses a common binary serialization library in rust to write data out to files. This process is highly dependent upon the low-level data structures, and any changes to the structs can potentially break file loading. There is an automated test which keeps a small copy of a known correct file, and verifies that kete can still read this file to ensure that we have not broken backwards compatibility at any point.

This setup, while easy to implement, is quite brittle.

There are several flavors of possible file structures which could be switched to, which broadly come in 2 flavors:

There are pros and cons to each of these, broadly summarized as: Rigidity vs flexibility

An additional concern to this is handling file versioning, there are a plethora of ways of handling version control on data files, especially in the age of AI training.

Options

Required Wikipedia Reading

Not Recommended:

These I think are poor fits, but included here for completeness

Of the options, the ones I believe make the most sense are:

dahlend commented 3 days ago

After looking at Parquet a bit more, I have realized there are some difficult choices which need to be made. Because parquet is basically a table of information, more complicated data structures become impossible. Kete has the concept of a SimultaneousState, which is a collection of state vectors along with a Field of View. There conceptually is no clean way of representing this with parquet. My opinion is shifting toward msgpack or BSON as a result of this research, as they will work with the complex data structures which kete produces.

Another option is to have parquet be an optional feature and output a limited subset of files which are "long term" supported.

fmasci commented 3 days ago

I’m open to whatever you find efficient, provided there’s an open source package/lib to read and parse it (of course). You could also write your own (home grown) packed bin format, customized for this application. Also, as mentioned, do maintain the ability to generate ascii dumps under a debug switch.

Regards, Frank

On Oct 15, 2024, at 12:52 PM, Dar Dahlen @.***> wrote:

After looking at Parquet a bit more, I have realized there are some difficult choices which need to be made. Because parquet is basically a table of information, more complicated data structures become impossible. Kete has the concept of a SimultaneousState, which is a collection of state vectors along with a Field of View. There conceptually is no clean way of representing this with parquet. My opinion is shifting toward msgpack or BSON as a result of this research, as they will work with the complex data structures which kete produces.

— Reply to this email directly, view it on GitHubhttps://github.com/Caltech-IPAC/kete/issues/139#issuecomment-2414879143, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADXUBIPJH66CAAF5SZWAVQ3Z3VW7HAVCNFSM6AAAAABP7TAWXKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJUHA3TSMJUGM. You are receiving this because you are subscribed to this thread.Message ID: @.***>

dahlend commented 3 days ago

Parquet Feature:

Add a feature which supports reading/writing a subset of the kete files to parquet tables.

Tables which could be supported:

  1. States - Vector of states, no fields of view. Columns:

    • Designation
    • Epoch
    • Position
    • Velocity
    • Frame
    • Center NAIF ID
  2. NEATM Properties - Physical properties of object to be used for optical modeling. Columns:

    • Designation
    • H Mag
    • G Parameter
    • V Albedo
    • IR Albedos
    • Beaming
    • Diameter