ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
137 stars 16 forks source link

Avoid assuming binary representation of C structs #116

Open plimkilde opened 2 years ago

plimkilde commented 2 years ago

The spec uses C struct declarations in a few places to describe certain data structures in LAS files. This may be problematic since it could seem implied that the structs as given are directly serializable/deserializable. The binary representation of such C structs is ABI-dependent, with variable endianness and padding bytes.

While the use of C structs would be fine for documenting a LAS-handling C library, that is not what the LAS specification describes. I think it would be more appropriate to replace them with simple tables of fields instead.

The data structures currently described using C structs are:

hobu commented 2 years ago

I would continue to argue for C-style struct definitions. The rationale for this is the language and meaning of them is very precise, and it is clear to software implementors what the definitions intent and interpretation is. The GeoKeyDirectoryTag is taken directly from the GeoTIFF specification, for example. It makes little sense to change it.

It is my understanding that the original pre-1.0 LAS specification, the file format was intended to be slurped up as mmap'd data structures in C. An old timer could correct us on that. It doesn't seem worth the trouble to change this.

esilvia commented 2 years ago

I'm inclined to agree with @hobu on this, but I confess I'm biased as I'm primarily a C++ developer myself and I found the C-style structs helpful when writing my own code.

That said, if you'd like to draft/propose an alternative method of documentation, I'd be happy to entertain it.

plimkilde commented 2 years ago

One approach that retains the C struct notation is used in the current COPC spec draft (kudos @hobu), which adds this caveat:

Some of the file format is described using C-language fixed width integer types. Groups of entities are denoted with a C-language struct, though all data is packed on byte boundaries and encoded as little-endian values, which may not be the case for a C program that uses the same notation.

My issue with the C struct notation is not that it is C-centric, but rather that C or C++ programmers reading it should not assume that they can generally cast or mmap a bag of LAS bytes into a valid struct. I do get that it's a convenient way of communicating the data model, however.

esilvia commented 2 years ago

Thanks @plimkilde that does help.

If you'd like to propose that clarification to the data types section, you can find it here: https://github.com/ASPRSorg/LAS/blob/draft-1.4-R16/source/02.03_datatypes.sub

Alternatively, if you think it makes more sense to include it with the C-style structs, those can be found here: