ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
146 stars 17 forks source link

Clarification of what is considered scaled and unscaled #89

Closed visr closed 1 year ago

visr commented 4 years ago

In the spec there is one occurance of the term scaled or unscaled:

https://github.com/ASPRSorg/LAS/blob/761fcd71569d71eaa71723db1b66eac4842a930c/source/02.04_header.sub#L337-L340

This seems to me that by unscaled you mean real coordinates, and not the stored integers.

But if I look at the second and third message in this LAS room thread, it seems both authors use the term scaled to refer to the real coordinates, and not the stored integers. https://groups.google.com/d/msg/lasroom/KPPsO8twg9I/GCgJpoo2uP0J

I came onto this due to some discussion in the Julia package LasIO.jl: https://github.com/visr/LasIO.jl/pull/28#issuecomment-576037114, talking about the ambiguity of the term scaled. Does it mean you scale data from stored integers to real coordinates, or you scale real coordinates to stored integers?

The spec seemed clear to me, but then in laspy they use the reversed terminology, and, probably having read the term unscaled in the spec, even state in https://pythonhosted.org/laspy/tut_background.html:

The various LAS specifications say that the Max and Min X Y Z fields store unscaled values, however LAS data in the wild appears not to follow this convention. Therefore, by default, laspy stores the scaled double precision values and updates header files accordingly on file close. This can be overridden by supplying one of several optional arguments to file.close(). First, you can simply not update the header at all, by specifying ignore_header_changes=True. Second, you can ask that laspy store the unscaled values explicitly, by specifying minmax_mode=”unscaled”.

Is this really true? I've only ever seen real world coordinates for min/max X/Y/Z in LAS headers. This seems to instead come from confusion as to which is which.

rapidlasso commented 4 years ago

We first scale and then offset the (raw) XYZ integer coordinates stored in each point record of the LAS file with the scale and the offset values stored in the LAS header to get the (final) xyz coordinates. I usually call the (final) xyz coordinates "scaled and offset" but one could always see that the other way round, I guess.

visr commented 4 years ago

Indeed, this is clear to me. However, it doesn't address what is considered scaled and what is considered unscaled.

visr commented 4 years ago

Ok thanks. After reading your updated comment:

I usually call the (final) xyz coordinates "scaled and offset" but one could always see that the other way round, I guess.

So it seems to me that spec uses the opposite terminology that seems to be used in practice? In that case we should perhaps clarify this in the spec? Then the confusing note in the laspy documentation can be removed and all is well.

I'm not in favor of one or the other definition, but only trying to resolve its ambiguous nature.

esilvia commented 4 years ago

This change makes sense to me, although we'd have to go through the entire spec to ensure that all instances of these min/max values are consistent. Any objections?

esilvia commented 2 years ago

Two years later I finally have the head-space to revisit the R16 tickets like this one.

It occurs to me that all confusion could be mitigated by simply removing the word "unscaled" from the description of the Min/Max XYZ field.

The max and min data fields are the actual ~unscaled~ extents of the LAS point file data, specified in the coordinate system of the LAS data. If there are no point records in the file, these values must be set to zero.

The description already specifies that the values are specified in the LAS data's coordinate system, which to me implies that they are the true values and not the integer-based records. That is, a maximum Z of 123.456 would be stored as a double-float as 123.456, rather than as the integer 123456.

Meanwhile, the ExtraBytes field went the exact opposite direction as discussed in #4. We tried to fix this in LAS 1.4 R14, but concluded that we couldn't do so non-destructively. I think it would be clarifying if we removed the work "actual" from the description to avoid it sounding too similar to the XYZ min/max description.

If used, the min and max fields reflect the ~actual~ minimum and maximum values of the attribute in the LAS file, in its raw form, without any scale or offset values applied.

Therefore a max ExtraByte value of 1.23 with data_type=uint8 and scale=0.01 should be stored as 123.

visr commented 2 years ago

Yes, I think it's a good suggestion to remove the word 'unscaled' there.

In both quoted paragraphs I read 'actual' as that their values should be equal to the actual min/max of the data, Though I see how it could be misread to mean float rather than integer.

esilvia commented 2 years ago

Thanks for the quick response @visr! In the final text the Min/Max discussion is immediately preceded by a discussion of the transformation of XYZ values like so... image I wonder if the completely unambiguous approach would be to reference back to those equations and state explicitly that the header should store the min/max X_Coordinate, Y_Coordinate, and Z_Coordinate and not the min/max X_Record, Y_Record, and Z_Record.

visr commented 2 years ago

Ah I see, that would be even better.

esilvia commented 2 years ago

Here's a screenshot of the proposed change: image

esilvia commented 2 years ago

Here's another option. I think this one is a little more readable, consistent, and clear. Does this work? image

visr commented 2 years ago

Thanks, yes this last version doesn't leave any room for ambiguity, and reads easily.

esilvia commented 2 years ago

Thanks for checking! Final version attached with ExtraByte edits too.

image

I'll make a PR into R16 shortly.

LAS.pdf

nigels-com commented 2 years ago

Appreciating the effort here to clarify this point. Thanks!

hlyf-xs commented 1 year ago

Thanks for the quick response @visr! In the final text the Min/Max discussion is immediately preceded by a discussion of the transformation of XYZ values like so... image I wonder if the completely unambiguous approach would be to reference back to those equations and state explicitly that the header should store the min/max X_Coordinate, Y_Coordinate, and Z_Coordinate and not the min/max X_Record, Y_Record, and Z_Record.

Here, I want to know how to calculate the offsets.

nigels-com commented 1 year ago

The offset is stored in the LAS header and automatically applied (along with scale) when reading. The offsets are specified when writing LAS/LAZ and tend to be important for coordinates with a large magnitude, such as UTM co-ordinates.