ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
137 stars 16 forks source link

Clarify specification of X, Y, and Z Scale Factors #143

Open SunBlack opened 7 months ago

SunBlack commented 7 months ago

What is the issue about?

Inquiry about the specification

Issue description

We often have a discussion within our team about how the following part of the specification is meant:

https://github.com/ASPRSorg/LAS/blob/a2412413990da0768784781df852f2a7d4e711d6/source/02.04_header.sub#L314-L316

There are two different views on how to interpret this. As an example, you can take a point cloud with 8 points per unit, i.e. the coordinates are at a distance of 0.125, meaning that all coordinates have a (minimum) distance of 0.125: Interpretation 1) When seen as decimal numbers, a resolution of 0.001 would be required, which means that the scale factor should also be set to this value. Interpretation 2) Seen as a quantized scale, as shown in the calculation in the next section, the value could also be set to 0.125.

The difference between the two is that the coordinate values are always 125 apart in the first interpretation as an integer, but only 1 apart in the second, which means that LAZ compression is probably more efficient because the integer has more leading 0 bits.

However, since the data we receive usually corresponds to the convention of 1) and not 2), the question is how exactly this should be interpreted. However, I assume that 2) is meant, since the implementations of the readers that I have seen so far calculate it the way it is usually done with a quantized scale, in particular, as otherwise the scale factor could have been saved as an integer (number of decimal places) instead of as a float.

Would it therefore be possible to improve the specification so that it is clear which of the two variants is meant?

esilvia commented 6 months ago

In my opinion, both variants that you have listed are valid. The vast majority of use cases that I have seen have chosen to use either 0.01 or 0.001, so the example provided in the spec nudges the user in that direction.

In general we do recommend that users default to some multiple of 10^n, as following that convention does simplify merging datasets from different sources. Using nonconforming scale factors such as the one that you suggest does not make it an invalid LAS file, however, and it certainly makes sense in some cases. We do not REQUIRE that scale factors be a multiple of 10^n such as 0.01 or 0.001.

As a general guideline, the scale factor and offset serve two purposes:

  1. Enable storing large coordinates that would otherwise not fit into the 32bit integer dynamic range.
  2. Provide a hint of the intended precision of the source dataset.

For example, scale factors with MANY decimal places (e.g., 0.0123456789) imply extremely high data precision and typically come from optimizing the scale factor and offset to each specific LAS file. While it might make sense for maximizing data precision from a computer scientist's perspective, it is a nightmare for a data manager and makes combining datasets extremely difficult. It's also unnecessary since the source technology almost never has such high precision (excluding micron-scale lidar scanners, of course).

Moving onto your specific use case, I do wonder whether or not your assumptions are valid. A target density of 8 points per square unit doesn't necessarily mean that your precision is 0.125. I have never seen a dataset that perfectly distributed, except perhaps one from an aggregation model like Geiger-mode lidar. If your source technology is lidar or photogrammetry, you may be losing precision by storing your data in increments of 1/8th of a unit (0.125) instead of 1/100th of a unit (0.01). In other words, storing in 1/8th of a unit increments, you are somewhat rasterizing your data into 0.125-unit (meters?) cells.

SunBlack commented 6 months ago

Thanks for your clarification :-).

In general we do recommend that users default to some multiple of 10^n, as following that convention does simplify merging datasets from different sources

Ah so that only integer multiplication is necessary when merging? So if I get 2 LAS files, for example, where one has a scale factor of 1000 and the other has a scale factor of 10, I have to multiply all the coordinates of the first by the value 100 so that both have the same scale factor of 10. With non-conforming scale factors, I would have to multiply the position values with a floating point value if necessary, which would result in inaccuracies when merging, as the new values are saved as integers again. Have I understood the idea behind this correctly?

I have never seen a dataset that perfectly distributed, except perhaps one from an aggregation model like Geiger-mode lidar.

I had the use case in mind that, for example, you get an aerial photo as a tiff file and want to convert the file to LAZ. In this case you would have a perfect raster.

What do you think about changing the paragraph in the specification to sth. like this?

The scale factor fields contain a double floating-point value that is used to
scale the corresponding X, Y, and Z long values within the point records. The
corresponding X, Y, and Z scale factor must be multiplied by the X, Y, or Z
point record value to get the actual X, Y, or Z coordinate.

To choose a suitable scaling factor, you should decide between the following
two variants:
1) (Recommended) Use a scaling factor with a multiple of 10. For example, if
   the X, Y and Z coordinates are to have two decimal places, each scaling
   factor contains the number 0.01. This simplifies the merging of different
   data sources so that floating point numbers do not have to be used when
   adjusting the coordinates.
2) Use of other scaling factors. This may be necessary if greater precision is
   required during storing and the integer values for the coordinates would
   otherwise not be sufficient. Another reason may be that better compression
   can be achieved as an LAZ file.
esilvia commented 6 months ago

Thanks for your clarification :-).

In general we do recommend that users default to some multiple of 10^n, as following that convention does simplify merging datasets from different sources

Ah so that only integer multiplication is necessary when merging? So if I get 2 LAS files, for example, where one has a scale factor of 1000 and the other has a scale factor of 10, I have to multiply all the coordinates of the first by the value 100 so that both have the same scale factor of 10. With non-conforming scale factors, I would have to multiply the position values with a floating point value if necessary, which would result in inaccuracies when merging, as the new values are saved as integers again. Have I understood the idea behind this correctly?

You have understood me correctly. I would also add that a high quality LAS code library would not make this the responsibility of the user because the code library would normalize the data on its own. However, once you have done this long enough you will become keenly aware that not all LAS libraries are created equal. :)

I have never seen a dataset that perfectly distributed, except perhaps one from an aggregation model like Geiger-mode lidar.

I had the use case in mind that, for example, you get an aerial photo as a tiff file and want to convert the file to LAZ. In this case you would have a perfect raster.

Ahh, I have seen this done. I vaguely recall Martin @rapidlasso experimenting with this concept of storing raster data as LAZ and being shocked to discover that its compression can be more efficient than even the raster compression algorithms. Enjoy!

What do you think about changing the paragraph in the specification to sth. like this?

The scale factor fields contain a double floating-point value that is used to
scale the corresponding X, Y, and Z long values within the point records. The
corresponding X, Y, and Z scale factor must be multiplied by the X, Y, or Z
point record value to get the actual X, Y, or Z coordinate.

To choose a suitable scaling factor, you should decide between the following
two variants:
1) (Recommended) Use a scaling factor with a multiple of 10. For example, if
   the X, Y and Z coordinates are to have two decimal places, each scaling
   factor contains the number 0.01. This simplifies the merging of different
   data sources so that floating point numbers do not have to be used when
   adjusting the coordinates.
2) Use of other scaling factors. This may be necessary if greater precision is
   required during storing and the integer values for the coordinates would
   otherwise not be sufficient. Another reason may be that better compression
   can be achieved as an LAZ file.

I'm happy to consider moving these recommendations into a wiki article. We have attempted to minimize the amount of non-required editorial language (guidelines) within the spec itself to keep it lean and avoid confusion. Would that be helpful?

SunBlack commented 6 months ago

I'm happy to consider moving these recommendations into a wiki article. We have attempted to minimize the amount of non-required editorial language (guidelines) within the spec itself to keep it lean and avoid confusion. Would that be helpful?

Yes and no at same time 😅 I suspect many who read the specification don't know the wiki. I also didn't know for a long time that the specification is now maintained on GitHub. Therefore, I think it would be okay to move it to the wiki if you remove the previous example from the documentation (as it can be irritating) and refer to the wiki either at this point or in the introduction to the specification with a reference to the wiki that examples can be found there, for example.