ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
137 stars 16 forks source link

Correctness of data type definitions #115

Closed plimkilde closed 2 years ago

plimkilde commented 2 years ago

The spec currently refers to the C99 standard for the definition of the data types used (char, short, long etc.). As far as I can tell, taking this at face value may be self-contradictory and not reflect how LAS data should actually be parsed.

The "classic" C integer data types are permitted to have any of three different representations for signed integers (sign-and-magnitude, two's complement, one's complement) and, rather than having exact specified widths, merely have minimum requirements for the ranges they can hold (e.g. [-2147483647, +2147483647] for long). This could be a real problem since long is commonly 64 bits wide these days.

I suppose the desired behavior is actually that of C99's exact-width types, which are also guaranteed to be two's complement for the signed types. Thus, the integer types should be handled like this:

Type name in LAS spec Intended C99 type (stdint.h)
char int8_t
unsigned char uint8_t
short int16_t
unsigned short uint16_t
long int32_t
unsigned long uint32_t
long long int64_t
unsigned long long uint64_t

Note also that char in C may be either of signed char and unsigned char.

The float and double may technically not map to the single/double types in IEEE 754/IEC 60559. However, the C99 spec has extensive provisions for the common case when they do (they should map properly if __STDC_IEC_559__ is defined) as described in Annex F. Thus, it should suffice with a remark that these two types are 32- and 64-bit IEEE types, respectively (as is currently the case).

I'm not sure how all this should be addressed in the spec, but directly referring to C99's definitions for the integer type names seems incorrect.

esilvia commented 2 years ago

Sorry I missed this post.

I'm 100% on board with clarifying the data types here, but I confess I'm out of my depth in the computer engineering world when we talk about IEEE data types. I must defer to a more informed individual if we're to clarify the data types more precisely.

If I remember correctly, the primary desire in that particular section (§2.3 in the current revision) was to ensure that float and double are 4 and 8 bytes, respectively, and char and unsigned char are each 1 byte, as the sizes of these variables can vary by implementation, as you've noted. Notice also that the spec has endeavored to avoid the int data type altogether, for the same reason.

plimkilde commented 2 years ago

Sorry if the wording was a bit convoluted, but my point regarding the IEEE types was that I think the LAS spec is fine as-is for those ;-)

As far as I'm aware, IEEE 754 representation has been universally adopted in practice, with real-world variations only in endianness and in the interpretation of quiet-NaN/signaling-NaN bit flags.

Strictly speaking, though, the wording "4/8 byte IEEE floating point format" could also refer to decimal32/decimal64. To avoid ambiguity, one could refer specifically to the "binary32" and "binary64" formats. This is also how e.g. Rust documents its f32/f64 types.

esilvia commented 2 years ago

@plimkilde Thanks for clarifying! I took a first stab at it on a new branch in this commit: https://github.com/ASPRSorg/LAS/commit/1b2d74ca160f77b9a8125288b4605f638325c66f

Feel free to edit without worrying about messing anything up.

esilvia commented 2 years ago

@plimkilde I just refreshed the example PDF with your proposed change that can be downloaded here for the next 30 days: https://github.com/ASPRSorg/LAS/suites/5525142074/artifacts/177388595

Can you take a look and let me know if that addresses your concern? If not, I'll just close this ticket as inactive.

plimkilde commented 2 years ago

Sorry about the long wait, I finally got around to making a PR now.

esilvia commented 2 years ago

Here's a PDF version with all changes incorporated, plus adjustments of column widths to account for the more compact format.

LAS_#115.pdf

I did a quick read-through and fixed the one missing reference I saw.

There's a few references to UINT32_MAX and similar. Should those also be adjusted?

plimkilde commented 2 years ago

Thanks! I don't know how the ushort reference slipped through, but great that it's caught.

The UINT32_MAX macro is defined in C99 as the upper limit of the uint32_t type, so that should be all good.

esilvia commented 2 years ago

Sounds good, thanks! This is now queued for inclusion in R16. Thanks for the contribution @plimkilde !