ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
137 stars 16 forks source link

Compute point records from coordinates #107

Closed ErzhuoChe closed 2 years ago

ErzhuoChe commented 3 years ago

This is going to be a silly question but it bordered me for quite a while when I wrote code to read/write LAS files. The specification only specifies how to compute the coordinates from point records and does not show how to get point record. Should I use floor, round, or ceiling function to get the point records?

mloskot commented 3 years ago

Simply web search for how to read binary data from file in <your programming language of choice>

  1. Move file pointer about the point records data offset (obtained from header)
  2. Read point record size (obtained from header) bytes from the file pointer
  3. Cast/Convert the bytes into structure of point record according to the format version
  4. Repeat
ErzhuoChe commented 3 years ago

I guess I am not clear enough. I want to convert coordinates (double) to record (long). To be specific, how do I cast the calculation results to a long int. Do I use floor, ceiling, round, or truncate? I don't see it defined in the LAS specs.

abellgithub commented 3 years ago

This is not relevant to the LAS specification.

That said, it's up to you. You can do whatever you want. In most programming languages, when you apply double-precision scaling/offset to the integer X, Y, Z positions, you'll get a double-precision result. If you choose to round or truncate that result, you will lose precision, but there's nothing that stops you from doing either if you choose for your application.

kjwaters commented 3 years ago

My impression is that he's asking in the other direction. If you took the real coordinate double value and applied the scale and offset to get an integer value to store, you still have to decide how you get that integer. Should 50002.5 be stored as 50002 or 50003 in the LAS file?

Kirk

On Wed, Jan 27, 2021 at 10:21 AM Andrew Bell notifications@github.com wrote:

This is not relevant to the LAS specification.

That said, it's up to you. You can do whatever you want. In most programming languages, when you apply double-precision scaling/offset to the integer X, Y, Z positions, you'll get a double-precision result. If you choose to round or truncate that result, you will lose precision, but there's nothing that stops you from doing either if you choose for your application.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/107#issuecomment-768358494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5B33LUXUYQYFVYD5HCXFTS4AVOZANCNFSM4WOIM2SQ .

ErzhuoChe commented 3 years ago

@kjwaters Thanks! this is exactly what I was trying to ask. I think it matters especially when you use multiple software/programs to process the data. The errors caused by this can be accumulated. Also if I have negative values in the point record, a simple truncation will be a problem.

abellgithub commented 3 years ago

Again, this is up to the person writing the file and not a LAS format issue. A reader can only apply the transform as encoded, so you have to decide what value you'd like a user to ultimately get. There is no right or wrong here. You seem to have a grasp of the issues, so you need to decide on a procedure that works for your data and users. In PDAL, we round, but there's nothing that requires this.

rapidlasso commented 3 years ago

Interesting question. In LASlib and for LAStools I use these macros:

#define I8_QUANTIZE(n) (((n) >= 0) ? (I8)((n)+0.5) : (I8)((n)-0.5))
#define U8_QUANTIZE(n) (((n) >= 0) ? (U8)((n)+0.5) : (U8)(0))

#define I16_QUANTIZE(n) (((n) >= 0) ? (I16)((n)+0.5) : (I16)((n)-0.5))
#define U16_QUANTIZE(n) (((n) >= 0) ? (U16)((n)+0.5) : (U16)(0))

#define I32_QUANTIZE(n) (((n) >= 0) ? (I32)((n)+0.5) : (I32)((n)-0.5))
#define U32_QUANTIZE(n) (((n) >= 0) ? (U32)((n)+0.5) : (U32)(0))
mloskot commented 3 years ago

Interesting, @rapidlasso, could you share what's the actual rationale behind? Why not just truncate the decimal part after scaling?

rapidlasso commented 3 years ago

I choose to quantize to the quantized value with the smaller absolute difference. I hope that is what the above implements.

What's your rationale for truncating? This is what you do?

#define I16_FLOOR(n) ((((I16)(n)) > (n)) ? (((I16)(n))-1) : ((I16)(n)))
#define I32_FLOOR(n) ((((I32)(n)) > (n)) ? (((I32)(n))-1) : ((I32)(n)))
#define I64_FLOOR(n) ((((I64)(n)) > (n)) ? (((I64)(n))-1) : ((I64)(n)))
esilvia commented 3 years ago

I love this discussion. For the record, I agree with @abellgithub that this is up to the user based on their own priorities.

Personally I truncate for highest performance and least complexity.

If I were to put in the effort to round, I would use a rounding technique that I learned in my surveying curriculum in which I'd round to the nearest even integer for half-values. i.e., 1.5 would round to 2.0 but 4.5 would round to 4.0. With large samples this results in the least precision loss, especially for domains that can include positive and negative values.

mloskot commented 3 years ago

What's your rationale for truncating?

None, (int)n simply to just keep the integer part truncating the float-point result towards zero (after scaling to integer).

pchilds commented 3 years ago

@ErzhuoChe I agree with you that truncating would cause issues with negative values and break consistency. If speed is crucial and precision at this scale unimportant, then truncating could be preferable. Floor, ceiling and round are all good options and it is up to you, but speaking for myself, I would go for @rapidlasso 's approach as lastools are widely used across the industry and if any analyst is going to pick up your file unaware of how it was generated and make inferences about precision (uncertainty below/above), it is likely they will do so based on the rounding approach presented above.