ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
146 stars 17 forks source link

Add standard record format with just XYZ? #93

Open potree opened 4 years ago

potree commented 4 years ago

One of the issues of the LAS format is that all record formats contain data that is often not needed, which considerably increases file sizes.

Would it be possible to introduce a new record format, e.g. format 11, that only stores XYZ coordinates? Any other attribute could then be added via extra byte definitions. Instead of using format 2 with 26 bytes, a point cloud with position and color could then be stored in 16 bytes.

abellgithub commented 4 years ago

If you're going there, why not make everything extra bytes-defined?

m-schuetz commented 4 years ago

Because fixating XYZ ensures that at least the coordinates are properly understood by all software packages, with the notable exception of packages that store lat/long in XYZ, which already breaks support in many cases today. I'm not opposed to having one more format that makes everything extra byte defined, but I'm afraid that it would seriously hurt compatibility between different las loaders. Then again, some models might benefit from the option to store coordinates in double precision rather than 32 bit integers.

esilvia commented 4 years ago

Thanks for documenting this idea that has been proposed a few times but never formally. Because it requires the addition of a new PDRF, it must be included as an addition for LAS 1.5, rather than a revision of LAS 1.4.

As for whether it should include just XYZ or additional attributes, I'd be interested to hear input from others on whether it (or a PDRF12) should include other attributes such as PtSourceID, Timestamp, or even GUID that help ensure the unique identification of a point.

rapidlasso commented 4 years ago

I think the request is explicitly only XYZ for "stupid" point clouds. But if we do that we should probably also add a "XYZi" and a "XYZRGB".

adamsteer commented 4 years ago

+1 @abellgithub - publish a data schema with 'required dimensions for LAS', 'standard dimensions for LAS (pick any)' and 'how to make extra dimensions LAS readers understand' and drop PDRFs altogether. If I searched I'd probably find a lot of discussion about this under 'LAS2.0'.

...I get that people want incremental change, however incrementing until PDRF N (where N is the number of possible dimensions combinations) is less useful than spending effort (IMO) on short circuiting that process in a sane way.

ps @m-schuetz why coordinates as doubles - isn't that defeating the purpose of shrinking file sizes? scale and offset do a great job, even for georeferencing subcentimetre data...

potree commented 4 years ago

I think the request is explicitly only XYZ for "stupid" point clouds.

Mostly, yes. The majority of point clouds I'm using only have XYZ and RGB, so the remaining attributes waste space and increase processing/loading times. The case for XYZ only could be useful if I wanted to store RGB in a custom extra byte format with only 1 byte per channel.

ps @m-schuetz why coordinates as doubles - isn't that defeating the purpose of shrinking file sizes? scale and offset do a great job, even for georeferencing subcentimetre data...

Mainly because int32 is also somewhat limited, e.g. 4294km in millimeter precision or just 429km in 1/10th millimeter precision. But changing the coordinate representation isn't something I want to lobby for, it's fine for most cases now.

rapidlasso commented 4 years ago

Mainly because int32 is also somewhat limited, e.g. 4294km in millimeter precision or just 429km in 1/10th millimeter precision. But changing the coordinate representation isn't something I want to lobby for, it's fine for most cases now.

If the range of int32 is insufficient, the correct thing to do is to introduce a point record that uses int64 for x. y, and z. NOT float64. The full argument is here.

USGS_CLICK_LiDARBB_LAS2 0_floating_point_boycott

hobu commented 4 years ago

But if we do that we should probably also add a "XYZi" and a "XYZRGB".

XYZ is 12 bytes. Adding "i" is 16% more storage cost for situations where it is often not used. The strongest case for this PDRF type is image-matched point cloud scenarios that are now using PLY or OBJ as their interchange. The benefits LAS has over these formats is coordinate system description and fixed-precision coordinate storage.

We can decide that this usage isn't that important. I can also see where an organization might use it to deliver the bare minimum positioning in a lidar delivery that would typically be much richer (returns, intensity, times, etc). Larding it up with might-as-well dimensions decreases its value and utility, however.

esilvia commented 4 years ago

As discussed on today's conference call:

These additional records weren't sufficient motivation to establish the need for 1.5, but since 1.5 will likely happen anyway, it behooves us to add these records with 1.5. Doing so will encourage people to use LAS instead of inventing their own compact binary formats. I think there's sufficient merit to add the following PDRFs:

  1. XYZ - Minimal data type.
  2. XYZRGB - Obviously useful for photo-derived point clouds.
  3. XYZI - I'm not sure there's a good use case for this, but maybe there is? I'm leaning toward not including it.

I don't think there's enough benefit to changing the fundamental data type of XYZ to merit creating 2-3 PDRFs that use a fundamentally different type from the rest. In this case, I think the cat is out of the bag on this one and there's sufficient workarounds to address specific needs to continue using the int32.

manfred-brands commented 4 years ago

XYZI would be useful for underwater multibeam data. The intensity is used for classification of the seabed.

esilvia commented 1 year ago

From today's conference call:

  1. XYZ - Minimal data type with only position.
  2. XYZRGBI - Combine color with intensity/brightness to improve alignment with other file formats (e.g. PLY). Intensity as brightness facilitates greyscale visualization.

Note: XYZ still follows the convention used in all other PDRFs -- specifically, it is a 32int with the scale and offset applied and NOT a double-float. RGB-Intensity also follow the conventions established in other PDRFs as a 16-bit unsigned integer.

@hobu noted an added advantage to using LAS instead of those other binary/ASCII/etc formats is that it by definition includes CRS information in a standard format.

esilvia commented 7 months ago

@hobu suggested during GeoWeek 2024 that we should consider providing a 1-byte or 2-byte User Data field to PDRFs 11-12 for storing additional photogrammetric or similar information, such as Alpha values, PtSourceID, Channel, etc. Is there a compelling reason to do this or not?

nkules commented 7 months ago

Wouldn't that be best accommodated with extra bytes?

I know there are plenty of uses for those values but its adding some additional fields when I felt the goal of those PDRFs were to be "generic points".

hobu commented 7 months ago

Wouldn't that be best accommodated with extra bytes?

In my mind it would be a convenient dual use for the situation where color data has some kind of alpha value or the rare situations when there's infrared or some other band. I don't think we want to have XYZRGBAI that includes both.

If the sentiment is generally meh to this, I'm easily dissuaded.

EZChe-EZDataMD commented 7 months ago

@hobu suggested during GeoWeek 2024 that we should consider providing a 1-byte or 2-byte User Data field to PDRFs 11-12 for storing additional photogrammetric or similar information, such as Alpha values, PtSourceID, Channel, etc. Is there a compelling reason to do this or not?

Here are a few thoughts:

esilvia commented 6 months ago

We revisited the justification for this today, calling into question the fact that LAZ compression in particular makes the existence of zero-filled fields moot. Core advantages of doing this:

  1. Base, minimal point format that takes up less memory/storage space
  2. Standardized binary format for XYZ data – discourage people from inventing their own
  3. Standardized CRS description – missing from several binary formats like PLY
  4. Self-describing ExtraByte definitions
  5. Future-proofing LAS

Questions:

  1. Is there any point in storing unclassified points? --> yes, it's just meant to be an exchange format
  2. Does adding new formats put undue burden on the developers wanting to adopt LAS 1.5? --> Not really. They have to deprecate PDRF 0-5 anyway.
  3. Does anyone actually still want this?

Core motivation: Boring point cloud with GIS (CRS) metadata.

Final proposal:

  1. PDRF 11 = XYZ
  2. PDRF 12 = XYZIRGB <-- reordered from earlier to match PDRF 6-10 order

We need immediate feedback before our next meeting on May 16th or we will de-scope this proposal from LAS 1.5.

mikec-bmg commented 6 months ago

I think the raw XYZ field PDRF would be useful, mainly for the reasons that you mentioned. It doesn't matter for compression, but hopefully it would be used in place of some generic text files with just XYZ data and provide the CRS description, among other things.

Mike Childs

Software Developer | Blue Marble Geographics

1-207-622-4622 | bluemarblegeo.com https://www.bluemarblegeo.com/ 22 Carriage Ln, Hallowell, Maine 04347 U.S.A.

https://www.facebook.com/bluemarble.geographics https://www.instagram.com/bluemarblegeo https://www.linkedin.com/company/blue-marble-geographics https://twitter.com/BlueMarbleGeo http://youtube.com/subscription_center?add_user=BlueMarbleWebinars

On Thu, Apr 4, 2024 at 12:34 PM Evon Silvia @.***> wrote:

We revisited the justification for this today, calling into question the fact that LAZ compression in particular makes the existence of zero-filled fields moot. Core advantages of doing this:

  1. Base, minimal point format that takes up less memory/storage space
  2. Standardized binary format for XYZ data – discourage people from inventing their own
  3. Standardized CRS description – missing from several binary formats like PLY
  4. Self-describing ExtraByte definitions
  5. Future-proofing LAS

Questions:

  1. Is there any point in storing unclassified points? --> yes, it's just meant to be an exchange format
  2. Does adding new formats put undue burden on the developers wanting to adopt LAS 1.5? --> Not really. They have to deprecate PDRF 0-5 anyway.
  3. Does anyone actually still want this?

Core motivation: Boring point cloud with GIS (CRS) metadata.

Final proposal:

  1. PDRF 11 = XYZ
  2. PDRF 12 = XYZIRGB <-- reordered from earlier to match PDRF 6-10 order

We need immediate feedback before our next meeting on May 16th or we will de-scope this proposal from LAS 1.5.

— Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/93#issuecomment-2037915109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHEZLJWC6IAOGKGI4BIVXXTY3WMNDAVCNFSM4NQPDUDKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBTG44TCNJRGA4Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hobu commented 6 months ago

I think the raw XYZ field PDRF would be useful

I do too, BUT has the need for it vanished at this point? How how wide of a community needs it relative to the PDRFs that already exist? I see it as a feature that could potentially bring a non-LiDAR, non-LAS point cloud audience to LAS due to alternative common formats not providing for standard geospatial metadata. I'm 🤷 as to whether or not it would lead to more LAS software implementations in practice.

The downsides of adding this are:

It seems very nice-to-have instead of have-to-have. Someone needs to make the have-to-have case for it.

EZChe-EZDataMD commented 6 months ago

It seems very nice-to-have instead of have-to-have. Someone needs to make the have-to-have case for it.

I agree that it is a nice-to-have instead of have-to-have. Frankly every new PDRF would be useful. I don't see a ton of benefits of having new PDRFs (I'm sure it'd save space for LAS, but for LAZ, I don't think it'd be much in practice). Maybe we can consider not requiring any field in PDRFs and provide default values. I can see the way it was written now makes people think you have to have all the information to properly populate a PDRF. Regarding which PDRFs to use, I think it'd be a good idea to have a table summarizing all the PDRFs and their fields for easy lookup.

mikec-bmg commented 6 months ago

I guess implementation-wise it seems like it would be trivial to implement, but it would require additional changes and those often do take much longer than you'd expect.

It is definitely a nice-to-have vs gotta have.

Mike Childs

Software Developer | Blue Marble Geographics

1-207-622-4622 | bluemarblegeo.com https://www.bluemarblegeo.com/ 22 Carriage Ln, Hallowell, Maine 04347 U.S.A.

https://www.facebook.com/bluemarble.geographics https://www.instagram.com/bluemarblegeo https://www.linkedin.com/company/blue-marble-geographics https://twitter.com/BlueMarbleGeo http://youtube.com/subscription_center?add_user=BlueMarbleWebinars

On Thu, Apr 4, 2024 at 4:37 PM Ezra Che @.***> wrote:

It seems very nice-to-have instead of have-to-have. Someone needs to make the have-to-have case for it.

I agree that it is a nice-to-have instead of have-to-have. Frankly every new PDRF would be useful. I don't see a ton of benefits of having new PDRFs (I'm sure it'd save space for LAS, but for LAZ, I don't think it'd be much in practice). Maybe we can consider not requiring any field in PDRFs and provide default values. I can see the way it was written now makes people think you have to have all the information to properly populate a PDRF. Regarding which PDRFs to use, I think it'd be a good idea to have a table summarizing all the PDRFs and their fields for easy lookup.

— Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/93#issuecomment-2038382505, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHEZLJWZ44DIFDHY5FQVVI3Y3XI3XAVCNFSM4NQPDUDKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBTHAZTQMRVGA2Q . You are receiving this because you commented.Message ID: @.***>

abellgithub commented 6 months ago

I tried to determine the motivation by reading from the beginning and I'm not understanding.

If the file is small, I don't think it much matters if there are fields containing zeros. If it's large it will likely be compressed and the zero fields compress to almost nothing. Some readers even provide selective decompression, so you don't even have to touch the fields that you don't care about. What's to be gained by having some special PDRFs?

Is the real issue a desire to have a way to say that some fields should be ignored?

esilvia commented 5 months ago

It seems very nice-to-have instead of have-to-have. Someone needs to make the have-to-have case for it.

Maybe we can consider not requiring any field in PDRFs and provide default values. I can see the way it was written now makes people think you have to have all the information to properly populate a PDRF. Regarding which PDRFs to use, I think it'd be a good idea to have a table summarizing all the PDRFs and their fields for easy lookup.

This is a really interesting idea from @abellgithub and @EZChe-EZDataMD. A summary table describing the distinctions between formats is something the spec has always needed, and revisiting the Required column in each table could provide a solution without adding unwanted complexity. I happen to have drafted up a table for own sanity that I reference regularly, so the level of effort is low and the value is high.

At the moment, I'm trending strongly toward one PDRF at most, with just XYZ and nothing else. There's a strong case for adding no new formats as well.