ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
138 stars 16 forks source link

Clarification of Classification Lookup VLR #82

Open kjwaters opened 5 years ago

kjwaters commented 5 years ago

I think the classification lookup VLR could use some additional information. As it is, the specification has the table represented as a 256 x 16 byte payload with each 16 bytes containing a 1 byte class number and a 15 byte description.

It doesn't specifically say it, but I assume that the table should include all class numbers from 0 to 255. What do I put in the descriptions for classes that aren't defined? Null? Spaces?

esilvia commented 5 years ago

I always assumed that it would only include used classes, but on second glance you might be right. I'd imagine that you'd leave all unused fields as null. It could certainly use some clarification.

As we discussed at JALBTCX, I've never seen this VLR actually used. Maybe @lgraham-geocue or @rapidlasso can shed some light?

rapidlasso commented 5 years ago

I have never ever seen it used in any of the many LAS / LAZ file that I have come across.

lgraham-geocue commented 5 years ago

When LAS was first defined, it was not an ASPRS standard. I think this was used in those early days to define the class names. It could still be useful for the user defined section. Theoretically, software could read these classifications from the VLR and populate a user interface. Of course, none of the local/global stuff has ever been really solved – e.g. I read two files into a project with conflicting metadata and then merge these data into tiles….

kjwaters commented 5 years ago

I haven't seen it used either. My thought was to start inserting it into the output files we make on Digital Coast, but I wanted to see if any packages even picked it up first. None of the ones I have seem to see it.

dpev commented 5 years ago

I was planning on adding support for this relatively soon, so it's good to know I needn't bother. I wonder if it's worth officially deprecating? I was planning some elaborate support for handling the cases mentioned where different files may define classes inba different order, it seems this would have been quite the waste of time.

At the least, perhaps some sort of indication to readers of the standard that they are unlikely to encounter support for this would be helpful.

Regards Dave

manfred-brands commented 5 years ago

We at Fugro have implemented support for this. We use the bathy extension classification for airborne LIDAR to tag points as WaterSurface (41) vs Seafloor (40). We also want to use it for storing the different classifications found by our land based lidar systems. We have used it to describe the result of cleaning, instead of marking all points as noise, mark then as deleted by user or deleted by filterX/filterY. But this was more for the benefit of our point cloud classifier. Internally we use the same numbers and the table would only be for the benefit of consumers.

Regarding the original question, as per LAS standard, anything not used will be zero.

Indeed what to do when importing LAS files from different sources with different Classification Lookup? One option would be to use the PointSourceId to determine the Lookup dictionary. Alternatively when importing classifications could be merged and homogenised, assuming that we don't need more than 255 different classifications in total and a user could map overlapping classifications.

We have a team working on a LAS viewer/editor/manual classifier that could use/add this.

kjwaters commented 5 years ago

I'm glad to hear someone is using it. I plan to move ahead and populate it. I think it is confusing for users to see only the standard table. Sometimes that table is from pre-LAS 1.4. If the table were populated, that might encourage software makers to use it.

I'm still interpreting the spec to require filling something out for all 256 possible classes since it says the size is 256 x 16 bytes.

esilvia commented 4 years ago

@kjwaters We discussed this in today's bimonthly LWG call. We agreed that the current Classification VLR does appear to expect that all 256 records be present regardless of whether or not the strings are actually populated (e.g., there will also be a record for class 15, even if the description is all null).

However, we also agreed that 15 characters isn't nearly enough to write something useful and proposed that maybe Kirk's application would make for a good case study to design a Classification VLR v2 with the following characteristics:

At 1+32+128=161 bytes per Classification, that only puts you at 41,216 bytes if all 256 classifications are used, which means it still fits into a base VLR (max size 65,536).

kjwaters commented 4 years ago

I'm sorry I missed the meeting. I do like the proposed version 2 VLR. Does something like that need to wait for a new revision (i.e. 1.5) of LAS or is adding additional standard VLRs considered minor?

esilvia commented 4 years ago

My understanding is that adding new VLRs is a minor change since it doesn't affect existing implementations.

abellgithub commented 4 years ago

This seems like a problem. We already have well-defined classification values from 0 - 18. Allowing re-assignment of those values should be prohibited.

kjwaters commented 4 years ago

That's not true when you look at older data. The original classification table did not go that high and there is a lot of data out there using conflicting class numbers below 18.

On Mon, Jul 29, 2019 at 1:18 PM Andrew Bell notifications@github.com wrote:

This seems like a problem. We already have well-defined classification values from 0 - 18. Allowing re-assignment of those values should be prohibited.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/82?email_source=notifications&email_token=AA5B33P7YSTL5TU7CYS4Q6TQB4Q5BA5CNFSM4HYH3ZTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BMQWA#issuecomment-516081752, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5B33LFFPBRJB56BJNNIU3QB4Q5BANCNFSM4HYH3ZTA .

abellgithub commented 4 years ago

We have a specification. Table 17 lists the classifications. What may have happened with "older data" is irrelevant to the current specification.

lgraham-geocue commented 4 years ago

Seems a reasonable approach would be:

If no Classification VLR is present, then ASPRS classifications are assumed. If a Classification VLR is present, it overrides the ASPRS defaults.

Of course, as with all of LAS, the big problem is merging tiles. What to do when metadata for tiles are not homogeneous.

abellgithub commented 4 years ago

The above suggestion breaks existing implementations.

kjwaters commented 4 years ago

It may be that things are already broken. The 1.4r13 spec has classes 0-18. The 1.4r15 spec has 0-22. If I made a 1.4 file using the r13 spec and used class 21 for something, how would you be able to tell what it was? The class table hasn't been static.

Kirk

On Mon, Jul 29, 2019 at 1:35 PM Andrew Bell notifications@github.com wrote:

The above suggestion breaks existing implementations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/82?email_source=notifications&email_token=AA5B33MZYS466AZ4LMATTRLQB4S6NA5CNFSM4HYH3ZTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BOIWQ#issuecomment-516088922, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5B33KF2GWL5XH75ZX3DSLQB4S6NANCNFSM4HYH3ZTA .

abellgithub commented 4 years ago

This was permitted by R13, as values up to 63 were reserved and values greater than 18 shouldn't have been used by a conforming file prior to R14. Values 64-256 were set aside as "user-definable".

rapidlasso commented 4 years ago

A tidbit from outside of the A(SPRS): Almost all of the 16 different German state survey departments have switched from ASCII to the LAS / LAZ formats but use some of their own classification codes which are in the process of getting harmonized across the different state surveys. AdV_classification

Deguerre commented 4 years ago

Following this ticket as we have also implemented support for user-definable classifications. (As an aside, "Transmission tower" has a standard code, but "Distribution pole" doesn't.)

I also interpreted the spec as saying that all 256 codes must be present, which makes the code field redundant.

esilvia commented 3 years ago

I just had a thought on this. Would it make sense to add a point count field for each classification to the v2 of this VLR, or is that deviating too far from the VLR's intended purpose? #39 attempts to address this use case so it might be a little redundant.

kjwaters commented 3 years ago

While that would be nice information to have, I think it would cause more trouble. It would mean the VLR has to be rewritten after processing if there was any filtering.

Kirk

On Tue, Sep 29, 2020 at 3:42 PM Evon Silvia notifications@github.com wrote:

I just had a thought on this. Would it make sense to add a point count field for each classification to the v2 of this VLR, or is that deviating too far from the VLR's intended purpose? #39 https://github.com/ASPRSorg/LAS/issues/39 attempts to address this use case so it might be a little redundant.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/82#issuecomment-700944255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5B33MSRRL4VRFKEYU3TK3SII2C5ANCNFSM4HYH3ZTA .

hobu commented 3 years ago

Would it make sense to add a point count field for each classification to the v2 of this VLR

Propose another one for "stats".

esilvia commented 3 years ago

Thanks all for the feedback. Here's the proposal for a Classification Lookup VLR v2 as I understand it:

  1. 1-byte unsigned char for classification number.
  2. 32 characters for Classification name (expectation that it will match the existing Classification table, although perhaps translated into a non-English language)
  3. 64 characters for null-terminated Classification description (free-form)

Does this look right?

My final question is whether it makes more sense to design it to be sparse as discussed, or whether it should always have 256 records like the original Classification Lookup VLR. The argument in favor of the latter is that making the VLR a fixed size makes it possible to update/add/remove entries in the Classification Lookup VLR in-place without rewriting the entire LAS file.

kjwaters commented 3 years ago

I believe that it is. The extra space for filling the whole table shouldn't be too much and software that displays it could skip empty rows. I haven't seen a lot of software that display the current VLR, but those have been a bit messy.

Kirk

On Wed, Nov 18, 2020 at 12:45 PM Evon Silvia notifications@github.com wrote:

Thanks all for the feedback. Here's the proposal for a Classification Lookup VLR v2 as I understand it:

  1. 1-byte unsigned char for classification number.
  2. 32 characters for Classification name (expectation that it will match the existing Classification table, although perhaps translated into a non-English language)
  3. 64 characters for null-terminated Classification description (free-form)

Does this look right?

My final question is whether it makes more sense to design it to be sparse as discussed, or whether it should always have 256 records like the original Classification Lookup VLR. The argument in favor of the latter is that making the VLR a fixed size makes it possible to update/add/remove entries in the Classification Lookup VLR in-place without rewriting the entire LAS file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/82#issuecomment-729844827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5B33PU2XUNG2Y55WPCBO3SQQBZTANCNFSM4HYH3ZTA .

lgraham-geocue commented 3 years ago

I would think 256 entry, fixed size. Much easier to manage.

Lewis Graham GeoCue Group Inc.

esilvia commented 3 years ago

@kjwaters @lgraham-geocue Thanks for confirming I was thinking along the right lines. In that case I think it makes sense to drop the first byte as redundant and only include the names and descriptions. I'll start drafting this up.

pchilds commented 3 years ago

Sorry, just noticed https://lidarmag.com/2021/01/24/las-exchange-las-to-have-new-classification-lookup-vlr/. I disagree with the statement "A brief survey of the community failed to produce a single person or example file using it" as @manfred-brands and @Deguerre have clearly stated that they use it and there has been interest shown by @kjwaters and @dpev in implementing it. I've implemented it, though I must admit it wasn't well received by my previous company and we did have issues with writing data in reserved categories and getting caught out by that. That was our fault though and we should have read the spec better. It does look like others get caught out by this too so maybe it could do with stressing more why they are reserved and the consequences for the unwary developer. Anyway I find the classification lookup absolutely essential. Getting a las file without the semantics of 27, 123, 69, made explicit is about as annoying as getting a Hotine Oblique Mercator LAS file but no CRS VLR. A sidecar file at least tells something, but unless there is some clear uniting spec it requires overhead of negotiation and development per source/consumer. I like the proposed v2. v1 is lacking in detail and the extended description will help. I am concerned; however, about legacy handling. As a dev, having to support a widely different v1 and v2 might be difficult. I would suggest that in #2 the proposed 32 chars revert to the 15 chars of v1. After all, if there is a longer description field to be expressive in, there should not be a problem keeping number 2 truncated to 15 chars for legacy compatibility. I find 256 classes not enough for managing different customer needs so use these 15 chars like a UUID and each customer picking and choosing a 256-64 subset. Retaining the 15 chars for #2 would make development much smoother. I also must admit I don't conform to the spec in that I leave off from the 256 what is not used. Re: "update/add/remove entries in the Classification Lookup VLR in-place without rewriting the entire LAS file" I find inserting a lookup a once off operation and if I would ever have to add an entry, then it would be as a result of editing the classification of one or more points, which if I'm going via lazzip then rewriting the whole file is pretty much a given. I've got to say that I don't have a strong opinion in this regard as it would not be much effort to change the code implementation without breaking anything. The main thing is to have something clear and consistent going forward.

esilvia commented 3 years ago

@pchilds Thanks for the input on this. When I wrote that article, I did forget about the @manfred-brands post and interpreted the post from @Deguerre differently. That's my mistake.

I appreciate the feedback about whether or not the v2 VLR should be sparse. I think I agree that it makes the most sense for the spec to indicate that all 256 are not required, but users can choose to include them if the "edit-in-place" functionality is desired. I can add language to indicate how that should be done.

I appreciate your feedback about the classification name being more terse and wanting to maintain consistency with v1. However, I don't really understand the use case - are you trying to make it so that one can switch classification VLR versions more smoothly? In addition, when I look at the non-English examples from @rapidlasso I do note that they are all far longer than 16 characters, so extending it to 32 characters seems essential to me. We can discuss on the next LWG call in two weeks.

pchilds commented 3 years ago

The use case I can suggest is as an analytics provider. I'd classify a data set that I want to onsell to two customers. The first might have a system where classifications 64-82 represent genera of Hirundinidae, whereas the second wants to use an overlapping range for palm fruits. Now I don't want to have two code bases to deal with the conflict so I'll offload all these numbers outside of the code base and use the 15 char string internally as a unique identifier. I build an architecture of my code around a 15 char system and then a customer comes along who says "Can I have it in v2?" Great. I don't have to dump things in the description VLR and can put it more where it belongs. But do I have to change the architecture of my code base? I could hash from the 32 char strings to 15, but what do I do if there is a clash? Personally I'm not in a place at this moment where it is disruptive. It is just if I end up wanting to support v1 and v2 for different customers I need another layer of abstraction, whereas if the 15 char field hung around then that is an element of commonality that I can capitalise on.

pchilds commented 3 years ago

Also if the Classification Name is to be localised, then 32 chars might feel just as tight as having 15 in English. You get 3 bytes in utf-8 for a lot of CJK characters, but at the same time those languages are more represented more tersely. The trouble is more in a lot of near eastern languages that get stuck with 2 bytes per character but need a similar amount of characters to express the same thing. Could push it longer as there's still room to play with in keeping the VLR below 2^16 but I'd almost feel it would be better spent expanding the description to 192 or so.

esilvia commented 2 years ago

I re-read through this whole thread because I feel like it has wandered around somewhat and it's been almost a year since we last had substantial discussion on it. Here's the proposed structure for Classification Lookup VLR v2 again:

There's enough room in a standard VLR to extend the class name to 64 bytes if we want to, but I don't think we want to pretend to support UTF-16... it just feels like a can of worms.


Here's a current link to the column I wrote for lidar magazine on this topic: https://lidarmag.com/2020/11/20/xyz-exchange/


While trying to wrap up R16 over the past month I'm strongly trending toward thinking that v2 of the Classification Lookup VLR belongs in LAS 1.5. Doing so provides a couple of opportunities:

  1. We could make it a required VLR in LAS 1.5.
  2. We can deprecate v1 to protect implementers from having to support both VLRs.
  3. We can break backwards compatibility with the original Classification Lookup VLR by lengthening the classification name field.

What do you all think? Should we push this forward in LAS 1.4 R16, or does it belong in 1.5? I expect 1.5 to happen late this year or in the first half of next year.

Deguerre commented 2 years ago

So since I've been namechecked a couple of times... I'm doing a different job now, although it's one where the need for custom classification lookup will be important within the next year or so.

I'm inclined to agree with @esilvia here. The v1 VLR is just plain broken and should be deprecated. Given that, R16 is probably not the right place to deprecate it. I am also inclined to agree that i18n is a can of worms, but if the WG wants to handle it here, I suggest this as a compromise:

I assume by "required VLR", that means "compliant implementations must support it", not "compliant files must include it".

esilvia commented 1 year ago

The discussion on whether or not to implement this change as a revision of 1.4 is now moot. In today's LWG meeting we agreed it makes sense to deprecate (remove?) v1 of the Classification Lookup VLR in 1.5 and replace it with v2 of the VLR.