ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
139 stars 16 forks source link

Improve readability by adding a "byte offset" column to header block and record format descriptions #55

Closed m-schuetz closed 3 years ago

m-schuetz commented 6 years ago

Currently, the spec only describes the bytes size for elements in the header and the different point data records. Additionally providing byte offsets to each attribute would be a huge help to developers writing las readers.

e.g. current version

Item Format Size Required
File Signature (“LASF”) char[4] 4 bytes *
File Source ID unsigned short 2 bytes *
Global Encoding unsigned short 2 bytes *
Project ID - GUID data 1 unsigned long 4 bytes *
Project ID - GUID data 2 unsigned short 2 bytes *
Project ID - GUID data 3 unsigned short 2 bytes *
Project ID - GUID data 4 unsigned char[8] 8 bytes *

With offsets:

Item Format Size Offset Required
File Signature (“LASF”) char[4] 4 bytes 0 *
File Source ID unsigned short 2 bytes 4 *
Global Encoding unsigned short 2 bytes 6 *
Global Encoding unsigned short 2 bytes 8 *
Project ID - GUID data 1 unsigned long 4 bytes 10 *
Project ID - GUID data 2 unsigned short 2 bytes 14 *
Project ID - GUID data 3 unsigned short 2 bytes 16 *
Project ID - GUID data 4 unsigned char[8] 8 bytes 18 *
hobu commented 6 years ago

My concern about inserting the offset too is it is duplicate information, and it can easily become out of sync with the rest of the header structure. There have been many instances of the spec having issues like that.

rapidlasso commented 6 years ago

I think adding the offset is a good idea. I have (re-)done this offset calculation again and again over the past 11 years of LAStools/LASlib/LASzip whenever I needed it and having it (correctly) pre-calculated would have been nice to have as an additional information in the specification. The header hardly ever changes so keeping it in-sync should require minimal / no effort. Come on, @hobu, we can do this. (-;

hobu commented 6 years ago

Come on, @hobu, we can do this. (-;

It's something that can quickly and very easily be miscalculated or get out of sync. I was just voicing my concern, not 👎 the idea.

esilvia commented 6 years ago

I've gone back and forth on this idea a few times, so I'm actually quite excited to see others that want it! I was thinking we'd also add a "total bytes" row at the very end of each table, too, as that's very handy information that's otherwise not included in the spec.

esilvia commented 6 years ago

@hobu Definitely a valid concern. I'll keep an eye out for sync issues, and hopefully I can count on one of you to watch for it, too, and especially check my math during the Pull Request reviews.

m-schuetz commented 5 years ago

Since I'm currently writing another LAS reader, I'll drop some of the tables in this thread as a semi-persistent place to look them up in the future.

Rearanged the columns because offset reads better if it comes before the size, and the format reads better if it isn't squeezed in between offset and size.

Header (taken from 1.2 spec)

Item Offset Size Format Required
File Signature ("LASF") 0 4 bytes char[4]
File Source ID 4 2 bytes unsigned short
Global Encoding 6 2 bytes unsigned short *
Project ID - GUID data 1 8 4 bytes unsigned long
Project ID - GUID data 2 12 2 bytes unsigned short
Project ID - GUID data 3 14 2 bytes unsigned short
Project ID - GUID data 4 16 8 bytes unsigned char[8]
Version Major 24 1 byte unsigned char *
Version Minor 25 1 byte unsigned char *
System Identifier 26 32 bytes char[32] *
Generating Software 58 32 bytes char[32] *
File Creation Day of Year 90 2 bytes unsigned short
File Creation Year 92 2 bytes unsigned short
Header Size 94 2 bytes unsigned short *
Offset to point data 96 4 bytes unsigned long *
Number of Variable Length Records 100 4 bytes unsigned long *
Point Data Format ID (0-99 for spec) 104 1 byte unsigned char *
Point Data Record Length 105 2 bytes unsigned short *
Number of point records 107 4 bytes unsigned long *
Number of points by return 111 20 bytes unsigned long[5] *
X scale factor 131 8 bytes double *
Y scale factor 139 8 bytes double *
Z scale factor 147 8 bytes double *
X offset 155 8 bytes double *
Y offset 163 8 bytes double *
Z offset 171 8 bytes double *
Max X 179 8 bytes double *
Min X 187 8 bytes double *
Max Y 195 8 bytes double *
Min Y 203 8 bytes double *
Max Z 211 8 bytes double *
Min Z 219 8 bytes double *
Total 227 bytes

Header (taken from 1.4 spec)

Item Offset Size Format Required
File Signature ("LASF") 0 4 bytes char[4]
File Source ID 4 2 bytes unsigned short
Global Encoding 6 2 bytes unsigned short *
Project ID - GUID data 1 8 4 bytes unsigned long
Project ID - GUID data 2 12 2 bytes unsigned short
Project ID - GUID data 3 14 2 bytes unsigned short
Project ID - GUID data 4 16 8 bytes unsigned char[8]
Version Major 24 1 byte unsigned char *
Version Minor 25 1 byte unsigned char *
System Identifier 26 32 bytes char[32] *
Generating Software 58 32 bytes char[32] *
File Creation Day of Year 90 2 bytes unsigned short
File Creation Year 92 2 bytes unsigned short
Header Size 94 2 bytes unsigned short *
Offset to point data 96 4 bytes unsigned long *
Number of Variable Length Records 100 4 bytes unsigned long *
Point Data Format ID (0-99 for spec) 104 1 byte unsigned char *
Point Data Record Length 105 2 bytes unsigned short *
Legacy Number of point records 107 4 bytes unsigned long *
Legacy Number of points by return 111 20 bytes unsigned long[5] *
X scale factor 131 8 bytes double *
Y scale factor 139 8 bytes double *
Z scale factor 147 8 bytes double *
X offset 155 8 bytes double *
Y offset 163 8 bytes double *
Z offset 171 8 bytes double *
Max X 179 8 bytes double *
Min X 187 8 bytes double *
Max Y 195 8 bytes double *
Min Y 203 8 bytes double *
Max Z 211 8 bytes double *
Min Z 219 8 bytes double *
Start of Waveform Data Packet Record 227 8 bytes unsigned long long *
Start of first Extended Variable Length Record 235 8 bytes unsigned long long *
Number of Extended Variable Length Records 243 4 bytes unsigned long *
Number of point records 247 8 bytes unsigned long long *
Number of points by return 255 120 bytes unsigned long long[15] *
Total 375 bytes

Format 0 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 3 bits 3 bits (bits 0, 1, 2) *
Number of Returns 3 bits 3 bits (bits 3, 4, 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 15 1 bytes unsigned char *
Scan Angle Rank (-90 to +90) - Left Side 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Point Source ID 18 2 bytes unsigned short *
Total 20 bytes

Format 1 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 3 bits 3 bits (bits 0, 1, 2) *
Number of Returns 3 bits 3 bits (bits 3, 4, 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 15 1 bytes unsigned char *
Scan Angle Rank (-90 to +90) - Left Side 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Point Source ID 18 2 bytes unsigned short *
GPS Time 20 8 bytes double *
Total 28 bytes

Format 2 (taken from 1.2 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 3 bits 3 bits (bits 0, 1, 2) *
Number of Returns 3 bits 3 bits (bits 3, 4, 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 15 1 bytes unsigned char *
Scan Angle Rank (-90 to +90) - Left Side 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Point Source ID 18 2 bytes unsigned short *
Red 20 2 bytes unsigned short *
Green 22 2 bytes unsigned short *
Blue 24 2 bytes unsigned short *
Total 26 bytes

Format 3 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 3 bits 3 bits (bits 0, 1, 2) *
Number of Returns 3 bits 3 bits (bits 3, 4, 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 15 1 bytes unsigned char *
Scan Angle Rank (-90 to +90) - Left Side 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Point Source ID 18 2 bytes unsigned short *
GPS Time 20 8 bytes double *
Red 28 2 bytes unsigned short *
Green 30 2 bytes unsigned short *
Blue 32 2 bytes unsigned short *
Total 34 bytes

Format 4 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 3 bits 3 bits (bits 0, 1, 2) *
Number of Returns 3 bits 3 bits (bits 3, 4, 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 15 1 bytes unsigned char *
Scan Angle Rank (-90 to +90) - Left Side 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Point Source ID 18 2 bytes unsigned short *
GPS Time 20 8 bytes double *
Wave Packet Descriptor Index 28 1 bytes unsigned char *
Byte offset to waveform data 29 8 bytes unsigned long long *
Waveform packet size in bytes 37 4 bytes unsigned long *
Return Point Waveform Location 41 4 bytes float *
X(t) 45 4 bytes float *
Y(t) 49 4 bytes float *
Z(t) 53 4 bytes float *
Total 57 bytes

Format 5 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 3 bits 3 bits (bits 0, 1, 2) *
Number of Returns 3 bits 3 bits (bits 3, 4, 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 15 1 bytes unsigned char *
Scan Angle Rank (-90 to +90) - Left Side 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Point Source ID 18 2 bytes unsigned short *
GPS Time 20 8 bytes double *
Red 28 2 bytes unsigned short *
Green 30 2 bytes unsigned short *
Blue 32 2 bytes unsigned short *
Wave Packet Descriptor Index 34 1 bytes unsigned char *
Byte offset to waveform data 35 8 bytes unsigned long long *
Waveform packet size in bytes 43 4 bytes unsigned long *
Return Point Waveform Location 47 4 bytes float *
X(t) 51 4 bytes float *
Y(t) 55 4 bytes float *
Z(t) 59 4 bytes float *
Total 63 bytes

Format 6 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 4 bits 4 bits (bits 0 - 3) *
Number of Returns (given Pulse) 4 bits 4 bits (bits 4 - 7) *
Classification Flags 15 4 bits 4 bits (bits 0 - 3)
Scanner Channel 2 bits 2 bits (bits 4 - 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Scan Angle 18 2 bytes short *
Point Source ID 20 2 bytes unsigned short *
GPS Time 22 8 bytes double *
Total 30 bytes

Format 7 (taken from 1.4 spec)

Item Offset Size Format Required
X 0 4 bytes long *
Y 4 4 bytes long *
Z 8 4 bytes long *
Intensity 12 2 bytes unsigned short
Return Number 14 4 bits 4 bits (bits 0 - 3) *
Number of Returns (given Pulse) 4 bits 4 bits (bits 4 - 7) *
Classification Flags 15 4 bits 4 bits (bits 0 - 3)
Scanner Channel 2 bits 2 bits (bits 4 - 5) *
Scan Direction Flag 1 bit 1 bit (bit 6) *
Edge of Flight Line 1 bit 1 bit (bit 7) *
Classification 16 1 bytes unsigned char *
User Data 17 1 bytes unsigned char
Scan Angle 18 2 bytes short *
Point Source ID 20 2 bytes unsigned short *
GPS Time 22 8 bytes double *
Red 30 2 bytes unsigned short *
Green 32 2 bytes unsigned short *
Blue 34 2 bytes unsigned short *
Total 36 bytes
ErzhuoChe commented 3 years ago

Should we call it byte offset so that people don't get confused with the term "offset"?

esilvia commented 3 years ago

Just finished.

@m-schuetz can you review the tables in the attached PDF to verify that I did the math correctly and that it meets your expectations?

LAS.pdf

m-schuetz commented 3 years ago

@esilvia Just checked, 0-7 match with my own calculations above and 8-10 also look good. Perhabs you could also append a "total" row to the header block?

esilvia commented 3 years ago

@esilvia Just checked, 0-7 match with my own calculations above and 8-10 also look good. Perhabs you could also append a "total" row to the header block?

@m-schuetz Thanks for confirming! I went back and forth on that and ended up not including that since the "header size" is an actual field in the header itself. Seemed weird to include it. Maybe not though?