gadomski / las-rs

Read and write ASPRS las files, Rust edition.
MIT License
73 stars 32 forks source link

Test for EVLR offset is not valid for laz #39

Closed ktmattson-alion closed 5 months ago

ktmattson-alion commented 3 years ago

This check:

if evlr.start_of_first_evlr < offset_to_end_of_points {
    return Err(Error::OffsetToEvlrsTooSmall(evlr.start_of_first_evlr).into());
} // .. //

will give a false positive on compressed files, since evlr.start_of_first_evlr refers to the file as-is, while

pub fn offset_to_end_of_points(&self) -> u64 {
        u64::from(self.offset_to_point_data)
            + u64::from(self.number_of_point_records) * u64::from(self.point_data_record_length)
    }

is the size of the uncompressed point data records.

For comparison (because I don't know nothing about nothing myself,) M Isenburg's LAStools just goes for it: stream->seek(header.start_of_first_extended_variable_length_record); ref So I think it's safe to say this check is extraneous.

gadomski commented 3 years ago

I agree. Do you have any ~real-world~ examples of this we can use for a regression test?

EDIT: Any examples are fine doesn't have to be real data :-).

ktmattson-alion commented 3 years ago

I was using this data from USGS: https://drive.google.com/file/d/105-DlPnyrS086Etx4epJFIQM0qPk9Ayx/view?usp=sharing super real world :)

gadomski commented 3 years ago

Great, thanks! Unfortunately this file's too big to include in the repo as a test case. I've tried filtering it down with PDAL, but PDAL seems to convert the EVLR(s) to VLR(s). If you are able to create a smaller (<1MB preferred) version of this file or another one, that'd help a lot. I'm going to keep trying to create a dummy laz file w/ an EVLR to use as a test case, but so far haven't had any success.

ktmattson-alion commented 3 years ago

I was afraid of that. You're creating a laz with this crate for testing?

gadomski commented 3 years ago

You're creating a laz with this crate for testing?

Trying to. Turns out it's non-trivial to construct an laz w/ EVLRs with the current tooling ecosystem.

ktmattson-alion commented 3 years ago

Yeah, 1.4 is a lot more complex than its predecessors. Our use case doesn't include creating new files, only reading them, but I'm already pretty annoyed to have to write a WKT-CRS parser because of the EVLRS change.

ktmattson-alion commented 3 years ago

centimated.zip LasTools lets you preserve every nth point. Here's the same file but 1/100 points.

gadomski commented 3 years ago

Thanks, that's perfect, except ... for some reason the centimated file doesn't trigger the expected error. Not sure why not yet, I'll dig in. But we should be able to get this to work (I checked and the centimated file did have EVLRs).

Screen Shot 2021-06-10 at 4 34 27 PM

tmontaigu commented 3 years ago

I took a quick look at it, it seems that the centimated file does not trigger the error because its a LAS 1.4 with the legacy point count (u32) set to 0 and the new u64 point count set to the good value.

The computation done by offset_to_end_of_points only uses the u32 value whatever the version of the file is. So in this case the computed offset is equal to the offseto to the first point which is less than the offset to the last point.

(The original file is also LAS 1.4, but with both the legacy and non legacy point count set)

ktmattson-alion commented 3 years ago

Nice catch. The spec says:

A LAS 1.4 file writer who wishes to maintain backward compatibility must maintain both the
legacy fields and the equivalent non-legacy fields in synchronization. Of course, this is not
possible if the number of points exceeds UINT32_MAX, in which case the legacy fields must be
set to zero. If a file writer is not maintaining backward compatibility, the legacy fields must always
be set to zero. 

"Gentle deprecation" is so irritating.

tmontaigu commented 3 years ago

but in this case, its the offset_to_end_of_points that is not 100% correct

ktmattson-alion commented 3 years ago

For sure. I'm just chafing at spec writers who leave in "if you feel like it" clauses.

ktmattson-alion commented 3 years ago

Ok, so, just to regroup here, is this a correct assessment of the bug: IF the file is compressed AND has been written by a backwards-compatible type writer (u32 point count is non-zero) THEN offset_to_end_of_points will be greater than evlr.start_of_first_evlr (Skeng)

tmontaigu commented 3 years ago

yes