ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
144 stars 16 forks source link

Deprecate "tuples" and "triples" in ExtraByte implementation #1

Closed esilvia closed 5 years ago

esilvia commented 7 years ago

Current ExtraByte implementation allows "tuple" and "triple" data types, although actual support for this appears to be practically nonexistent and inclusion of it in the specification complicates implementation by a significant margin.

This change would deprecate data_type values 11-30 (Table 24), leaving them unassigned.

Additionally, the no_data, min, max, scale, and offset triples should be converted to singles, leaving the space for doubles and triples as Reserved. Explicitly recommend that these values be set to zero.

Instead, add a recommendation that associated ExtraBytes be assigned an appropriate suffix, such as [1], [2], and [3]. For example, the topobathy domain profile should be altered to have three identical `sigma xyz' fields named 'sigma xyz[1]', 'sigma xyz[2]', and 'sigma xyz[3]'.

rapidlasso commented 7 years ago

Sounds good. Although all our other arrays start counting from zero so that we should remain consistent and state "'sigma xyz[0]', 'sigma xyz[1]', and 'sigma xyz[2]'" ... ?

esilvia commented 7 years ago

As a C++ programmer, 0-indexed arrays work for me. :+1:

HKHeidemann commented 7 years ago

As an IDL and Python scripting hack, I concur. Array[0] - Array[n]. !Consistency!

rapidlasso commented 6 years ago

Should we allow a "leaner" payload for the [user_id / record_ID] ["LAS_Spec" / 4] (maybe a [user_id / record_ID] ["LAS_Spec" / 5]) description for these additional "extra bytes" attributes that only contains the storage needed for a single (non-tuple or non-triple) value for the additional "extra bytes" attributes? It's not too hard to "run out of payload space" with 65535 bytes and 192 byte for each struct. I know at least one forestry software that makes excessive use of the additional "extra bytes" attributes feature available in LAS 1.4.

esilvia commented 6 years ago

@rapidlasso This sounds interesting, but is it related to the duple/tuple topic? It doesn't sound like it to me, in which case could you create a new Issue?

rapidlasso commented 6 years ago

Obviously it is related to the duple/tuple topic. Once we disallow duples/tuples then there are lots of unused bytes in the current "Extra Bytes" VLR as there is only one populated entry in each of the 3 entry long min / max / scale / offset / nodata arrays. But first we need to get this deprecation (that we've been talking about for years) into the specification. A leaner VLR could be introduced at the same time or later.

esilvia commented 5 years ago

@rapidlasso Now that I've written it out myself, you're quite right about the number of unused bytes being quite substantial. My concern is we run the risk of breaking compatibility if we redo the extrabytes VLR within 1.4, so we might have to do a "lean" extrabytes VLR in 1.5.

rapidlasso commented 5 years ago

Turns out FUGRO implemented the triples for their RAMMS system. But it seems to be a very recent thing and should not be a show stopper.

C:\software\LAStools\bin>lasinfo matlab.las WARNING: data type 21 of attribute 1 ('sigma xyz') is deprecated lasinfo (190127) report for 'matlab.las' reporting all LAS header entries: file signature: 'LASF' file source ID: 0 global_encoding: 21 project ID GUID data 1-4: 00000000-0000-0000-0000-000000000000 version major.minor: 1.4 system identifier: 'RAMMS' generating software: 'Fugro.Las' file creation day/year: 24/2019 header size: 375 offset to point data: 1841 number var. length records: 3 point data format: 9 point data record length: 67 number of point records: 0 number of points by return: 0 0 0 0 0 scale factor x y z: 0.0000001 0.0000001 0.01 offset x y z: -71.690406400000001 21.863001799999999 -41.719999999999999 min x y z: -71.6904064 21.8630018 -41.72 max x y z: -71.6904064 21.8630018 -41.72 start of waveform data packet record: 0 start of first extended variable length record: 0 number of extended_variable length records: 0 extended number of point records: 1 extended number of points by return: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 variable length header record 1 of 3: reserved 0 user ID 'LASF_Projection' record ID 2112 length after header 318 description 'OGC Coordinate System' WKT OGC COORDINATE SYSTEM: GEOGCS["NAD83(CSRS)",DATUM["NAD83_Canadian_Spatial_Reference_System",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0 ,0],AUTHORITY["EPSG","6140"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4617"] ] variable length header record 2 of 3: reserved 0 user ID 'LASF_Spec' record ID 100 length after header 26 description 'Waveform Packet Descriptor' index 1 bits/sample 16 compression 0 samples 600 temporal 1000 gain 1, offset 0 variable length header record 3 of 3: reserved 0 user ID 'LASF_Spec' record ID 4 length after header 960 description 'ExtraBytesDescription' Extra Byte Descriptions data type: 4 (short), name "(pseudo-) reflectance", description: "Radiometric calibration output", scale: 0.01, offset: 0 (not set) data type: 21 (unsigned char), name "sigma xyz", description: "XYZ coordinate uncertainty", scale: 0.01 0.01 0.01, offset: 0 (not set) data type: 1 (unsigned char), name "water column optical depth", description: "Water column optical depth", scale: 0.25, offset: 0 (not set) data type: 1 (unsigned char), name "figure of merit", description: "FOM for bottom measurement", scale: 1 (not set), offset: 0 (not set) data type: 1 (unsigned char), name "Bathymetry flags", description: "Flags", scale: 1 (not set), offset: 0 (not set) reporting minimum and maximum for all LAS point record entries ... X -1 -1 Y 0 0 Z -1 -1 intensity 2235 2235 return_number 1 1 number_of_returns 1 1 edge_of_flight_line 1 1 scan_direction_flag 0 0 classification 9 9 scan_angle_rank -22 -22 user_data 0 0 point_source_ID 105 105 gps_time -778492456.981222 -778492456.981222 Wavepacket Index 0 0 Offset 0 0 Size 0 0 Location 0 0 Xt 0 0 Yt 0 0 Zt 0 0 extended_return_number 1 1 extended_number_of_returns 1 1 extended_classification 9 9 extended_scan_angle -3713 -3713 extended_scanner_channel 1 1 attribute0 100 100 ('(pseudo-) reflectance') attribute1 1.2 1.2 ('sigma xyz') attribute2 10 10 ('water column optical depth') attribute3 20 20 ('figure of merit') attribute4 1 1 ('Bathymetry flags') WARNING: 1 points outside of header bounding box WARNING: there is coordinate resolution fluff (x10) in XYZ WARNING: there is serious coordinate resolution fluff (x100) in XYZ WARNING: there is very serious coordinate resolution fluff (x1000) in XYZ WARNING: there is extremely serious coordinate resolution fluff (x10000) in XYZ number of first returns: 1 number of intermediate returns: 0 number of last returns: 1 number of single returns: 1 overview over extended number of returns of given pulse: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 histogram of classification of points: 1 water (9)

manfred-brands commented 5 years ago

Not that new. We had it for almost 3 years, but effectively not used for public deliverables.

The main reason for implementing was the Topo Bathy Domain Profile. But despite that profile being 5 years old, finding files that use it was impossible. But the fields match what auxiliary data we have. I already have a ticket to make this R14 compliant (split sigma xyz into three fields). But as the domain profile has been official for 5 years, are there other (existing archived) files using these fields and how is R14 making these non-readable by new or updated implementation.

The same holds in a lesser extent for the min/max fields. No matter the intention, the R14 changes the type of the min/max fields from R13 without anywhere an indication in the file that it is a different version. CARIS stores min/max values as the field type. LASLib interprets them as such as well.

esilvia commented 5 years ago

The topobathy domain profile using it was my primary concern with deprecating the doubles/triples in a revision, but as far as I knew no one had used that component of it. Thanks @rapidlasso and @manfred-brands for reporting that is has, in fact, been used.

I can edit R14 to include the old tuples and triples with heavy editing to indicate them as deprecated. It won't be as clean but might avoid the issue if someone needed to understand what to do if that kind of data is encountered. I'd rather, not, though.

For the min/max field, the implementations I've seen used a double regardless of the Extrabyte data_type, which is consistent with how the LAS header was implemented. It got very confusing what the type of the min/max should be when a scale value was also included. Since I've seen it implemented both ways, I didn't see much harm in picking one of the methods and sticking with it more clearly, especially since it's a metadata field and doing so made it consistent with other fields of LAS.

manfred-brands commented 5 years ago

We need to supply uncertainty in LAS data. I had suggested to use the Bathy extensions, but we can easily switch to different 'standard' fields. Don't let the one file I have @rapidlasso stop you. The reason I gave it is that without interpreting the data type for field X correctly one cannot determine the offset for field X+1.

We will update our deliverable to comply with R14.

rapidlasso commented 5 years ago

I don't think that it is necessary to revive tuples / triples in any way. There is not much content out there (I think @manfred-brands can confirm) and for that little that there is I could add a simple conversion from tuples / triples to two / three individual attributes (possibly in las2las) if reading old content really becomes a bigger issue.

manfred-brands commented 5 years ago

@rapidlasso This particular deliverable is not yet in release and no other LAS files use tuples/triple. We were doing final testing to verify that other 3rd party software could read the files. We will remove the use of the triple sigma xyz in the final version. Indeed a tool that can convert these field can be created if there are actually real files out there.

esilvia commented 5 years ago

Closing the loop on this, I just wanted to observe that in our previous discussions on the type of ExtraByte min/max we concluded that a double made the most sense. See #4. I'll leave the PR open for another day or two in case some feel that I should revert that particular change to leave the ExtraByte min/max as anytype.

rapidlasso commented 3 years ago

Looks like there is someone using triple of scaled signed chars as "extra bytes" to express surface normals. And not just someone, but a very widely used software for generating photogrammetric point clouds. Should we contact Dmitry Semyonov from Agisoft and ask him to switch and store the coordinates individually?

E:\software\LAStools\bin>lasinfo -i C:\Users\Martin\Downloads\3DPoints.laz -cd
WARNING: data type 22 of attribute 0 ('normals') is deprecated
lasinfo (210128) report for 'C:\Users\Martin\Downloads\3DPoints.laz'
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             1
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          'Agisoft Metashape'
  generating software:        'Agisoft Metashape'
  file creation day/year:     81/2021
  header size:                227
  offset to point data:       1138
  number var. length records: 4
  point data format:          2
  point data record length:   29
  number of point records:    15293528
  number of points by return: 15293528 0 0 0 0
  scale factor x y z:         0.1 0.1 0.1
  offset x y z:               300000 1000000 -1000
  min x y z:                  327820.8 1088793.1 -80.1
  max x y z:                  345379.7 1099811.9 541.7
variable length header record 1 of 4:
  reserved             0
  user ID              'LASF_Projection'
  record ID            34735
  length after header  200
  description          ''
    GeoKeyDirectoryTag version 1.1.0 number of keys 24
      key 1024 tiff_tag_location 0 count 1 value_offset 1 - GTModelTypeGeoKey: ModelTypeProjected
      key 2048 tiff_tag_location 0 count 1 value_offset 5365 - GeographicTypeGeoKey: look-up for 5365 not implemented
      key 2049 tiff_tag_location 34737 count 5 value_offset 0 - GeogCitationGeoKey: CR05
      key 2050 tiff_tag_location 0 count 1 value_offset 1065 - GeogGeodeticDatumGeoKey: look-up for 1065 not implemented
      key 2051 tiff_tag_location 0 count 1 value_offset 8901 - GeogPrimeMeridianGeoKey: PM_Greenwich
      key 2054 tiff_tag_location 0 count 1 value_offset 9102 - GeogAngularUnitsGeoKey: Angular_Degree
      key 2055 tiff_tag_location 34736 count 1 value_offset 0 - GeogAngularUnitSizeGeoKey: 0.01745329252
      key 2056 tiff_tag_location 0 count 1 value_offset 7030 - GeogEllipsoidGeoKey: Ellipse_WGS_84
      key 2057 tiff_tag_location 34736 count 1 value_offset 1 - GeogSemiMajorAxisGeoKey: 6378137
      key 2059 tiff_tag_location 34736 count 1 value_offset 2 - GeogInvFlatteningGeoKey: 298.2572236
      key 2061 tiff_tag_location 34736 count 1 value_offset 3 - GeogPrimeMeridianLongGeoKey: 0
      key 2062 tiff_tag_location 34736 count 7 value_offset 4 - GeogTOWGS84GeoKey: TOWGS84[-0.16959,0.35312,0.51846,-0.03385,0.16325,-0.03446,0.03693]
      key 3072 tiff_tag_location 0 count 1 value_offset 5367 - ProjectedCSTypeGeoKey: CR05 / CRTM05
      key 3073 tiff_tag_location 34737 count 14 value_offset 5 - PCSCitationGeoKey: CR05 / CRTM05
      key 3074 tiff_tag_location 0 count 1 value_offset 5366 - ProjectionGeoKey: look-up for 5366 not implemented
      key 3075 tiff_tag_location 0 count 1 value_offset 1 - ProjCoordTransGeoKey: CT_TransverseMercator
      key 3076 tiff_tag_location 0 count 1 value_offset 9001 - ProjLinearUnitsGeoKey: Linear_Meter
      key 3077 tiff_tag_location 34736 count 1 value_offset 11 - ProjLinearUnitSizeGeoKey: 1
      key 3080 tiff_tag_location 34736 count 1 value_offset 12 - ProjNatOriginLongGeoKey: -84
      key 3081 tiff_tag_location 34736 count 1 value_offset 13 - ProjNatOriginLatGeoKey: 0
      key 3082 tiff_tag_location 34736 count 1 value_offset 14 - ProjFalseEastingGeoKey: 500000
      key 3083 tiff_tag_location 34736 count 1 value_offset 15 - ProjFalseNorthingGeoKey: 0
      key 3092 tiff_tag_location 34736 count 1 value_offset 16 - ProjScaleAtNatOriginGeoKey: 0.9999
      key 4099 tiff_tag_location 0 count 1 value_offset 9001 - VerticalUnitsGeoKey: Linear_Meter
variable length header record 2 of 4:
  reserved             0
  user ID              'LASF_Projection'
  record ID            34736
  length after header  136
  description          ''
    GeoDoubleParamsTag (number of doubles 17)
      0.0174533 6.37814e+006 298.257 0 -0.16959 0.35312 0.51846 -0.03385 0.16325 -0.03446 0.03693 1 -84 0 500000 0 0.9999
variable length header record 3 of 4:
  reserved             0
  user ID              'LASF_Projection'
  record ID            34737
  length after header  19
  description          ''
    GeoAsciiParamsTag (number of characters 19)
      CR05|CR05 / CRTM05|
variable length header record 4 of 4:
  reserved             0
  user ID              'LASF_Spec'
  record ID            4
  length after header  192
  description          ''
    Extra Byte Descriptions
      data type: 22 (char), name "normals", description: "unit normal vectors", scale: 0.00787402 0.00787402 0.00787402, offset: 0 (not set)
the header is followed by 148 user-defined bytes
LASzip compression (version 2.4r1 c2 50000): POINT10 2 RGB12 2 BYTE 2
reporting minimum and maximum for all LAS point record entries ...
  X              278208     453797
  Y              887931     998119
  Z                9199      15417
  intensity           0      65535
  return_number       1          1
  number_of_returns   1          1
  edge_of_flight_line 0          0
  scan_direction_flag 1          1
  classification      0          0
  scan_angle_rank     0          0
  user_data           0          0
  point_source_ID     1          1
  Color R 0 65535
        G 0 65535
        B 0 65535
  attribute0  -0.992126   0.992126  ('normals')
number of first returns:        15293528
number of intermediate returns: 0
number of last returns:         15293528
number of single returns:       15293528
covered area in square meters/kilometers: 53518784/53.52
point density: all returns 0.29 last only 0.29 (per square meter)
      spacing: all returns 1.87 last only 1.87 (in meters)
overview over number of returns of given pulse: 15293528 0 0 0 0 0 0
histogram of classification of points:
        15293528  never classified (0)
esilvia commented 3 years ago

@rapidlasso Thanks for the heads up. I think it would be great for someone with a line to Agisoft to contact them. Are you volunteering, or should I check my channels?

rapidlasso commented 3 years ago

It's taken care of. The new normal encoding will be

Data type            Name                                  Scale                                    Description
2 (char)                normal x                             1 / 127                                X surface normal
2 (char)                normal y                             1 / 127                                Y surface normal
2 (char)                normal z                             1 / 127                                Z surface normal

They will include this change in Metashape 1.7.3 update planned for release in the end of April / beginning of May.

For those Metashape users who have problems with normals encoding in the current version Dmitry Semyonov from Agisoft recommends to switch off the "Save point normals" option in the Export Points dialog.

abellgithub commented 3 years ago

The code already exists to read and write this and I don't think anyone's suggesting that these valid files become unreadable. Yes, the tuples and triples weren't a good idea, but they're already in the standard, so I'm not sure what the point is.

rapidlasso commented 3 years ago

Could you check this with your boss, @abellgithub? I think we agreed that tuples and tripes were a bad idea. LAStools no longer supports them.

hobu commented 3 years ago

I think we agreed that tuples and tripes were a bad idea.

Sure, but they're in the specification. We haven't made a new version of the specification. This particular deprecation is within the error budget of the line between needing a new version and the revision mechanism which I think we are overusing. A statement of deprecation in a revision is fine for this in my opinion, but to have a subsequent revision that states it is no longer valid should be out of bounds. It was in the specification, and we haven't iterated a new version. It's valid.

IMO, the committee should quit monkeying with revisions. We don't encode revision in the headers anywhere, and we're iterating toward needing to do that in some cases. Once we cross that threshold of needing to define a VLR that encodes which revision number software is supposedly writing too, it would have been easier to simply increment the version.

esilvia commented 3 years ago

There's technically nothing wrong with writing files using the tuples and triples. As you've said, the specification is out there to do so, and people can choose to support them. However, support for reading those ExtraBytes will be limited because it is deprecated functionality. Dmitry was receptive to the suggestion for splitting the triple and has chosen to do so.

This issue definitely pushed the line between revision and version. I don't wish to do anything else of this sort.