gadomski / las-rs

Read and write ASPRS las files, Rust edition.
MIT License
73 stars 32 forks source link

[WIP] Adding read for ExtraBytes #56

Closed bmmeijers closed 7 months ago

bmmeijers commented 1 year ago

As discussed in las-rs #12, the Work-in-Progress branch.

Not yet ready for integration, but feedback very much welcome.

Open questions I have myself:

tmontaigu commented 1 year ago

If i may,

  • I am not sure how to handle the data types of returned values (unify on f64 - that's what I currently did, also CloudCompare seems to do this, or make use of generics to handle the different possible return types)

Since f64 cannot represent all u64/i64 values, we don't want to potentially alter the values unless the user makes this explicit

So, I would suggest some kind of enum

enum ExtraByteValue{
   /// stores, Char,    Short,  Long, LongLong,
   Signed(i64),
  /// stores, UnsignedChar,    UnsignedShort, UnsignedLong, UnsignedLongLong,
   Unsigned(u64),
   /// stores Float, Double
   Float(f64),
}

impl ExtraByteValue {

  // convenience method if the user doesn't want to `match`
  // we could also have `as_i64`, `as_u64`
  pub fn as_f64(&self) -> f64 {
    match self {
     Self::Signed(v) => v as f64,
    Self::Unsigned(v) => v as f64,
    Self::Float(v) => v,
   }
  }
}

Using generics is also a valid option, but we would still need the enum above as generics means the user knows the expected extrabytes types beforehand

I am not sure how to handle best the fact that ExtraBytes fields can be used to store 1D, 2D or 3D vectors (currently only 1D values are implemented) - Could be modeled as an enum(OneValue(..), TwoValues(..), ThreeValues(..)) and then returned from all_values / value_for_named_field.

I experimented with extrabytes for las a long time ago, and i think an enum is also good option to manage the fact that extra fields can be 1D, 2D or 3D

bmmeijers commented 1 year ago

If i may,

Thanks for chipping in, useful comments!

  • I am not sure how to handle the data types of returned values (unify on f64 - that's what I currently did, also CloudCompare seems to do this, or make use of generics to handle the different possible return types)

Since f64 cannot represent all u64/i64 values, we don't want to potentially alter the values unless the user makes this explicit

So, I would suggest some kind of enum

I was under the impression that these are there to quantize the values you want to put, so you can store the per point values with a (much) smaller type, and also indicate to the user how much precision the values carry (given the measurement process). It seems to have been discussed in the process of going to LAS 1.4: https://groups.google.com/g/lastools/c/XjiFiAfhue0

So, how does this relate to the scale / offset parameters present, then? Only when you have a scale factor of 1.0 you can get the exact range of the underlying type (and the data typed in the underlying type), but if scale factor is different then it could either fit easily, or overflow much faster?

Thinking a bit more about this, it also depends on which of the Option bits are set: If the scale field should not be taken into account, you want to get a value out in the data type of the underlying representation, but if the scale field bit is set in the options, then you'd want a f64, right?

I experimented with extrabytes for las a long time ago, and i think an enum is also good option to manage the fact that extra fields can be 1D, 2D or 3D

I am not sure if we want to support his, as v.14 has made these deprecated. The standards doc contains now this text:

"Previous revisions of the LAS 1.4 specification utilized data_types 11-30 to define standard two- and three-member arrays, but this feature was never widely implemented and was deprecated in R1429 to simplify implementation."

See also: https://github.com/ASPRSorg/LAS/issues/1 I think the way forward that is proposed there is to model two or three values as separate ExtraBytes field (see example about surface normal in that thread), which indeed simplifies implementation.

tmontaigu commented 1 year ago

So, how does this relate to the scale / offset parameters present, then? Only when you have a scale factor of 1.0 you can get the exact range of the underlying type (and the data typed in the underlying type), but if scale factor is different then it could either fit easily, or overflow much faster?

Thinking a bit more about this, it also depends on which of the Option bits are set: If the scale field should not be taken into account, you want to get a value out in the data type of the underlying representation, but if the scale field bit is set in the options, then you'd want a f64, right?

To me, if the either scales/offset is set, then the value is quantized, so stored on a smaller signed or unsigned integer type to then be transformed to f64/f32 in the program, so disk space is saved up, and the user takes care of making sure that the scaled and unscaled values fit in their respective type

If neither are set, then, the values might be using the full range of the underlying type.

gadomski commented 1 year ago

Converting to draft while it's WIP.

gadomski commented 7 months ago

Closing as stale as it's been a year. Please re-open if there's more work on this PR.