ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
146 stars 17 forks source link

Add String ExtraByte type #56

Closed esilvia closed 6 years ago

esilvia commented 6 years ago

At ILMF I received a request that we add a String ExtraByte as data_type=31 for things like source file name, descriptors, etc. This would basically be a char array of some length. I see three possible ways to do this:

  1. Variable length arrays. The first two bytes would be the string length N as an unsigned short, followed by N char values that compose the string itself.
  2. Variable length arrays. Simply null-terminate the string and assume all non-zero data is part of the string.
  3. Fixed length arrays. We could store the string length as a length attribute in the ExtraByte definition, such as in two of the unused bytes in the EXTRA_BYTES struct. Unused chars would be set to zero.

There's potential for this to cause LAS files of tens of millions of points to explode, so I'm generally against the idea. Every use case I can think of is better served using one of the int data types and a lookup table, but since it came up at ILMF I think it's worth discussing.

rapidlasso commented 6 years ago

Variable length arrays (option 1 & 2) would prohibit seeking in the LAS file (as done by any spatial indexing structure such as the LAX file) as points would no longer have a fixed record size. Such files would break a number of implementations.

For strings up to 255 bytes in length this could already be done using data_type=0 which sees little to no use at the moment.

We should not forget to get rid of all tuple and triple data types.

hobu commented 6 years ago

At ILMF I received a request that we add a String ExtraByte as data_type=31 for things like source file name, descriptors, etc.

Per-point strings? We don't really want to encourage that do we? I think I would rather codify some VLR types to provide indexing keys rather than cause people to desire to store per-point strings. I'm definitely 👎 on variable length strings too.

mikec-bmg commented 6 years ago

I concur that per-point arbitrary strings don't make sense. I would suspect the use case is more a limited set of arbitrary strings, so a string table with the extra bytes value referring to the start of the NULL-terminated string would make sense. A size of 1, 2, or 4 bytes, depending on total size of string table for that extra byte would be good. The text encoding would likely also need stored so we know how to interpret the text for display (or force to use UTF-8).

esilvia commented 6 years ago

@rapidlasso Thanks for the reminder that point record sizes are fixed. That nixes the variable-length idea altogether. The tuple/triple issue (#1) is already scheduled for deprecation in R14.

I like the idea of codifying a generic "string table" VLR and encouraging people in that direction. Maybe I'll create a separate issue for that.