Clarification serialization of Waveform Data Packets

AlienRenders commented 3 years ago

In section 5.1, it is unclear how waveform data that is not a multiple of 8 or 16 bits is to be stored.

How is data to be stored? If it's 2 bits, do we store 2 bits per byte? Or 4 samples per byte with all 8 bits? And for 3 bits, it would still use all bits and wrap around to the next byte.

Is it ok to just specify 8bits and only store 2 bits per bytes?

Basically, I have 10bit data. Do I just store the low 10bits in pairs of bytes and specify 16bits in the packet descriptor? Or is a reader supposed to round up to the next multiple of 8? Or am I supposed to store 10 in the packet descriptor and "bit pack" the waveform data? So for the second sample, I'd use the high 6 bits as the low bits of the second sample. The high bits of the sample would go into the low 4 bits of the next byte and so on?

Which of these is correct:

Packet descriptor has 10 for bits per sample. Stored as 10bits in each pair of bytes.
Packet descriptor has 10 for bits per sample. Stored as packed bits. Every 40bits (5 bytes) stores 4 samples.
Packet descriptor has 16 for bits per sample. Stored as 10bits in each pair of bytes.

Some software give warnings if not using 8 or 16 bits per sample.

I know option 3 works. Readers don't care if high bits are zero. Everything goes through the gain and offset anyways. So this is likely what we're going to do. Bit packing would make the file smaller but many software packages don't like non multiples of 8.

Just some clarification and maybe a small example in section 5.1 would be nice.

Also, while I know using option 3 will work, I need to know what the specs actually say because we have our own reader/writer that is used by many people and we'd like to support the LAS specs fully.

esilvia commented 3 years ago

What a great question! §4.5 of the 1.4R15 spec explicitly says the following: Since it explicitly says that 2 bits is supported, I interpret this to mean that you have the option to pack the bits for sample sizes that aren't an exact multiple of 8, which I believe is Option 2 that you described. I don't see anything contradicting that in the spec or the wiki.

There is one caveat, however. The Waveform Packet Size in Bytes for PDRFs 4-5 and 9-10 (§2.6.5, 2.6.6, 2.6.10, 2.6.11) is an even number of bytes. This implies that the total packet size for each WDP record will need to be rounded up to an even increment of 8.

For example, with 10 bits per sample and 255 samples per point, that would be 2550 bits to store one record in the WDP. Trailing zeroes will need to be padded to the end of each record so that the next packet can start at an even byte increment, resulting in rounding up to 2552 bits (319 bytes) for the full packet.

This approach is still significantly smaller than 16 bits per sample, which comes to 4080 bits (510 bytes) for 255 samples per packet if one were to pursue Option 3 as you described, although as you noted might be more difficult for a developer to implement. Personally, I have yet to see a LAS file with a Bits Per Sample value that is not a multiple of 8.

esilvia commented 2 years ago

Closing due to lack of input. Thanks for the great question and discussion!

esilvia commented 2 years ago

I forgot to mention that I updated the Storage Size section of the waveform wiki to reflect the conclusion of this discussion.

ASPRSorg / LAS

Clarification serialization of Waveform Data Packets #114