Standardization of common extrabytes

esilvia commented 6 years ago

We've discussed listing some standardized extrabytes either in the specification itself or a supplementary document. This would encourage their adoption by the community and formalize common extrabytes as a guideline to future implementations.

We need to figure out the following:

Which extrabytes merit standardization?
Which fields should be formalized? e.g., optional fields like min, max, and nodata might not make sense.
Should data_type be formalized?
Where will this list live? Will it formally be included in the specification itself (thereby requiring ASPRS approval every time one gets added), or perhaps as a wiki page on GitHub with a link from the specification? I propose the latter.
What will be the approval process for new additions? (I propose people submit new Issues and then LWG votes yes/no).
Should units be formalized? For example, will we have to have separate entries for "refracted depth" in meters and feet?

Below is a link to what I think is a decent start to the standardized extrabytes. Once we get some agreement on a few of these I can start building a wiki page or contribute to Martin's pull request. Which one we do depends on the answer to the 4th question.

Standard ExtraBytes v1.docx

lgraham-geocue commented 5 years ago

What happened to Martin’s concept on standardizing how we represent units? I am not happy with the current design. It sort of suggests meters but leaves it open. If the units are not the same as the SRS of the file, we need to explicitly state what they are. I think meters for everything does not make since because in the (backward) USA, US Survey feet are most likely the units.

We have a chance right now to get this right. We need to pause and not do this till it is right. It will cause no end of trouble. There is no compelling rush to get this revision out the door. Let’s get this properly fixed. The proper fix is Martin’s suggestion on encoding the units for all Extra Bytes.

esilvia commented 5 years ago

I believe the objective of deprecating the tuples and triples was to simplify the implementation of ExtraBytes. I don't see how providing alternative units accomplishes that.

Also, I moved the units discussion to #60 since in my mind changing the LasAttribute structure to self-document units is a different discussion from attempting to standardize ExtraBytes.

I'd agree that the standardization itself (on the wiki) doesn't necessarily have to happen for R14, but I would at least like to provide the link to the registry in R14.

lgraham-geocue commented 5 years ago

I think Martin gave a very detailed description of how this would work.

esilvia commented 5 years ago

I've encountered some confusion arising from the name of the refracted depth ExtraByte attribute name. Although it is the slant length, having the word "depth" in the name suggests to some people that it's only the Z component. Is it too late to change the name to refracted length? Everyone's been so quiet on this part of the issue I can't tell if anyone besides me has implemented it.

abellgithub commented 5 years ago

I would suggest that data that is now encoded in the EXTRA_BYTES structure should instead be placed in a specification for "defined" fields. Using a field ID makes the rest of the information unnecessary, as it can be part of the specification. There may be a few fields that, unfortunately, must be tied to the SRS to get units, but other than that, I see no reason why a good data dictionary (supported with some sort of machine-readable format - JSON or other) shouldn't replace the information that currently makes up the encoded structure. Eliminating the variability IN the file eliminates questions that might otherwise come up when data doesn't meet the specification that you've proposed. (What if name and ID don't match? What if the range doesn't match what's expected, etc. What if the datatype doesn't match the doc?)

I previously worked on agricultural equipment and it depended on ISOBUS (ISO 11783) for interoperability between manufacturers. The issues were very similar. Manufacturers would submit requests to have items added to the data dictionary as necessary. This worked well, eliminated questions and was easy to extend. You can see the ISOBUS DD here if you're curious: https://www.isobus.net/isobus/site/exports?view=export

manfred-brands commented 5 years ago

Is there a reason that only 1 of fields from the Topo Bathy Lidar Domain profile description is listed here? If there is to be a standard catalogue, they all should be there. Even if those fields no longer fit R14 (sigma xyz needs splitting into three)

rapidlasso commented 5 years ago

Because the standardization of widely-used Extra Bytes is a new experiment. We only want to try with a few attributes and gain experience before making a larger list. That does not mean those listed are better than others, but that list may lead to more off-the-shelf software being able to utilize those attributes in the future.

esilvia commented 5 years ago

That, and it's the only ExtraByte from the topobathy LDP that I've actually seen implemented as a deliverable. The horizontal and vertical uncertainty are also a derivative improvement of the sigma xyz attribute from the LDP based on discussions I had with the stakeholders in previous conferences this past summer.

forkozi commented 5 years ago

Has there been any discussion about a different data type for sigma xy and sigma z? Sig-figs might not often go beyond cm level, but what does the working group think about being able to store these values at at least the mm level? Calculations done with cm-level sigma extrabyte values might have noticeable rounding errors.

esilvia commented 5 years ago

@forkozi Thanks for the input! My preference was to keep the uncertainty storage to 1 byte per point, which at cm level meant a theoretical minimum of 0 and maximum of 2.55m. Storing at at mm precision means a theoretical maximum of 0.255m, which to me seems too small.

forkozi commented 5 years ago

Hey Evon. Thanks for the message. The other day I brought this up to some folks in NOAA RSD's TPU research group (@brian-r-calder and @parrishOSU), and it was a mixed bag of reactions. Chris thought that perhaps extra precision might be helpful, but Brian didn't think there would be anything to gain from it. Regarding the range, however, both Chris and Brian were in favor of a higher maximum. I don't recall all the reasoning, but I think part of it had to do with accommodating different technologies. Although terrestrial laser scanning might not need more than 2.55 m, maybe SfM would? Using that line of reasoning, on the precision matter, might terrestrial laser scanning appreciate mm precision? (Is Las common in the terrestrial laser scanning community?) I'm a total newbie on this matter, so don't laugh at me too much...is 1 byte per point hard and fixed? I know you programmer types are pretty picky when it comes to your bits and bytes :)

esilvia commented 5 years ago

LAS in TLS is less common but it happens because e57 and similar formats aren't widely supported, nor is there a standard way to store the setup location (something I'd like to fix). If we did a SF of 0.02 then we could have a range of 0-5.10 meters, with of course a slight decrease in precision. IMHO any more than 5ish meters starts to lose usefulness, but maybe satellite LiDAR hits that range?

1 byte per point isn't a huge issue, although remember that storing Hz Precision also means storing Vt Precision, so it's actually 2 bytes vs 4 bytes per point. Again, not a huge issue because storage is relatively cheap, but is there really a need for it? Maybe there is. @gimahori might know.

rapidlasso commented 5 years ago

Indeed, storage is cheap and if the range is unused then the upper byte (or the upper bits of the upper byte) are mostly zero meaning they disappear when compressing the LAS file with LASzip (or any other redundancy removing compression scheme).

manfred-brands commented 4 years ago

LAS is no longer only used for laser data. We use LAS for multibeam echo sounder data. A typical system has 400 beams over a 150 degree arc. This gives a 0.375 degrees beam separation which is only part of the uncertainty and increases further away from nadir. Depending on range (water depth) that value can get big. 100m water depth results in a 9.5m beam width at the edge vs 0.65m at nadir. At 1000m they get 10x worse. At the same time we use underwater laser at short range where uncertainty is sub-millimetre. A single prescribed field will have not enough range in one case and not enough resolution in another. What is the purpose of the standardization? Knowing what field contains the uncertainty (number 113) so we don't have to check all kind of different names? In that case the type and scale factor can be different depending on the data at hand. All LAS readers should deal with that gracefully as it is defined in the ExtraBytesDefinition. If the purpose is that we can combine LAS files from different data sources we need a larger field. An unsigned short would allow mm resolution to 65m. Any data worse could be encoded as MaxValue.

rapidlasso commented 4 years ago

I agree with @manfred-brands. The standardization document should recommend suitable data types and strongly discourage the use of floats or doubles, but allow the data producer to populate scale and offset that are suitable for their data. Just like the LAS standard does it for x/y/z coordinates. But it is really important that the standardization document contains concrete use examples so we don't end up with attributes that are stored as 64 bit integers or as picometer scales or with double-precision floating-point. In the LASlib API I include a covenience function that can read any of the additional attributes from any scaled and offset representation and present it as a double-precision floating-point for processing.

esilvia commented 4 years ago

@manfred-brands @rapidlasso

From today's conference call: The purpose of standardization is fourfold:

to prevent multiple different names for the same attribute
to protect standardized names from acquiring multiple different meanings
to provide a centralized location to learn more about an ExtraByte that's discovered in one's dataset
to provide a method to publicize and therefore increase the value (i.e., usability) of ExtraBytes that users have produced

In that light, your points make sense to me, and imo also make the answer about units obvious. The standard ExtraBytes can recommend a standard unit, offset, and scale, but allow for deviations when the underlying technology, site, or application require greater range and/or precision.

If we don't do this, then we'll end up with multiple versions of the same "standard" ExtraByte for different levels of precision, and I believe that would be counterproductive to the stated goals. Thanks for providing some clarity on this issue. I believe that we can move forward with this information.

rapidlasso commented 4 years ago

I recommend we start (quickly) with one or two "standardized additional attributes" and see what we learn in the process of adding them as addendums (?) to the specifiction and implementing them in a few software packages. My number one pick would be "echo width" in tenth of a nanoseconds. My number two pick would be "height above ground" in centimeters or millimeters.

lgraham-geocue commented 4 years ago

I think echo width would be tough for anything other than a Riegl sensor but very useful.

The most used auxiliary data we employ (which we already encode as extra bytes) is the emission point to impact point unit vector (e.g. if you back-trace this unit vector from a point, it directs to the spot on the trajectory where the point was emitted). This is basically the same thing as the “point walking vector in the waveform spec). You need this (or some similar geometry metadata) to colorize a LIDAR point.

Lewis

rapidlasso commented 4 years ago

@lgraham-geocue how do you currently encode this "emission point to impact point unit vector" into extra bytes? I assume you use three different additional attributes, one for each vector component? What data type, scale, and offset are you using?

lgraham-geocue commented 4 years ago

Just the very simple structure below where “ux” means the unit vector in the x direction, etc. It is in the LAS spatial reference system and LAS units for range.

   float range; // extra bytes - distance to point
   float ux;   // extra bytes unit vec x
   float uy;     // extra bytes unit vec y
   float uz;   // extra bytes unit vec z

rapidlasso commented 4 years ago

@lgraham-geocue that is exactly what I was afraid of. (-: You are hereby excused from designing the storage details for standardization of "additional attributes" via extra bytes ... (-;

But seriously. For all near-nadir shots the ux and uy components will be close to zero and lead to very inefficient (aka over-precise) storage. We had this discussion before. It originally started when a fully flexible 2.0 version of the LAS specification was first proposed. It (fortunately) has died. This was about storing xyz in floating-point but the same argument holds for the three components of a unit vector. If we need to store unit vectors it may be worthwhile using a concise coding such as [Deering 1995]. The full discussion against floating-point is still available here and a screen shot of the opening argument is attached:

USGS_CLICK_LiDARBB_LAS2 0_floating_point_boycott

dpev commented 4 years ago

I'll just take this opportunity to also mention that putting range before the x,y,z should "trigger" any data storage designer. LAS has had some extremely unfortunate "design" in the past leading to every implementation having to read piece-by-piece and move around due to misaligning the structures, and even more pain writing, so moving forward it would be good to engage in practices that allow storage in a machine read/writable format without needing to move things around - in this case if it were to remain as 4 floats having range come after the normal (in the w component) would make a lot more sense than range in x, x in y, etc etc. But separate to that I agree with Martin that storing a unit vector in 3 floats is excessive.

Regards

Dave Pevreal

On Fri, Jan 24, 2020 at 10:05 AM Martin Isenburg notifications@github.com wrote:

@lgraham-geocue https://github.com/lgraham-geocue that is exactly what I was afraid of. (-: You are hereby excused from designing the storage details for standardization of "additional attributes" via standard bytes ... (-;

But seriously. For all near-nadir shots the ux and uy components will be close to zero and lead to very inefficient (aka over-precise) storage. We had this discussion before. It originally started when a fully flexible 2.0 version of the LAS specification was first proposed. It (fortunately) has died. This was about storing xyz in floating-point but the same argument holds for the three components of a unit vector. If we need to store unit vectors it may be worthwhile using a concise coding such as [Deering 1995]. The full discussion against floating-point is still available here https://web.archive.org/web/20100612184118/https://lidarbb.cr.usgs.gov/index.php?showtopic=538 and a screen shot of the opening argument is attached:

[image: USGS_CLICK_LiDARBB_LAS2 0_floating_point_boycott] https://user-images.githubusercontent.com/1107656/73034333-dfb37600-3e44-11ea-824c-0bd25076b6e6.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/37?email_source=notifications&email_token=AC3GI42LAD4AXZA4WTEFRXTQ7IWGJA5CNFSM4D36PXTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJZJT2A#issuecomment-577935848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3GI433OEKRCTETNA2A3D3Q7IWGJANCNFSM4D36PXTA .

lgraham-geocue commented 4 years ago

Alas I, Martin, I must take you to school – you have been living in the sheltered world of ALS for too long! Let’s give a fair shake to MLS, TLS and (if I might coin the expression, Drone Laser Scanning) DLS. I am not forgetting BLS (as in bathymetric but I am not an expert in that area). It is a rather long discussion that touches on several areas of core LAS design so I’ll try to take some time this weekend to put down the thoughts. One thing to give some thought to is the tension in LAS – should we always design to optimize the ease of implementation in LAZ (your concern here, no doubt) or optimize for exploitation of LAS? In the meantime, dust off your copy of Hamming (still just about the best reference for “on the metal” coding) and reacquaint yourself with optimal representation of random numbers distributed somewhat uniformly from 0 to 1 (well, -1 to 1 but that’s just a side detail).

Later,

Lewis

rapidlasso commented 4 years ago

Don't jump to conclusions too quickly about me having hidden LAZ intentions. Three fluffy floats will LAZ-compress with a higher compression rate than more compact unit vector representations. A recent paper on efficient storage of unit vectors (here with applications as shading normals) that also provides an accessible explanaition of why three floats are überfluffy is presented in this paper alongside a number of better alternatives. I think the "opt32" mapping looks promising:

https://www.researchgate.net/publication/301612007_A_Survey_of_Efficient_Representations_for_Independent_Unit_Vectors

Lewis' emotional response suggests that surface normals are not a suitable starting candidate for the first standardized additional attribute. (-; Maybe the beam or beamlet ID needed for Velodyne, Ouster, SPL100 and upcoming scanners is a less contentious candidate?

abellgithub commented 4 years ago

This all seems to have gotten very confusing and confused. Can someone summarize the basic proposal and goal?

gsmercier commented 4 years ago

Picking a common extrabyte data set and setting the standard as to how that data is stored in extrabytes. Lewis and Martin are discussing the best format for storing the data for one set of common extrabyte data set, but there are other common extrabyte data sets that could benefit from a standardization until they can agree on the best(for processing speed, not storage (my two cents)) for that one particular set. -George

On Fri, Jan 24, 2020, 8:40 AM Andrew Bell notifications@github.com wrote:

This all seems to have gotten very confusing and confused. Can someone summarize the basic proposal and goal?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/37?email_source=notifications&email_token=AL2MBMXHYWKD4RKZKNPFQ33Q7L4VHA5CNFSM4D36PXTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ27WJI#issuecomment-578157349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL2MBMQ5EIG3ZG35R3NGRADQ7L4VHANCNFSM4D36PXTA .

lgraham-geocue commented 4 years ago

So a few notes on LAS:

First off, the executive summary of my points (pun intended)

Consider all use cases when modifying, adding to LAS
Do not add any more data to a LAS point record that requires reference outside of the point record to decode the point (e.g. scale, offset in a header). Keep points self-contained.

Stating the obvious but LAS is ubiquitous, supporting (there are probably other cases of which I am not aware): ALS – traditional Airborne Laser Scanning MLS – Mobile Laser Scanning DLS – Drone Laser Scanning – I have found DLS to contain elements of both ALS and MLS but is more akin to MLS TLS – Tripod Laser Scanning (where it seems e57 never really gained much traction) BLS – Bathymetric Laser Scanning (where the first LAS “profile” was designed)

The first consideration (and my objection to Martin’s comments) is that when we design for LAS, we tend to think primarily of our own domain space. Consider Martin’s comment - “For all near-nadir shots the ux and uy components will be close to zero…” The application space we are dealing with is Mobile Laser Scanning (MLS) and Drone Laser Scanning (DLS). Here the “incident” angle unit vectors range from -1.0 to 1.0 for all three coordinates since we are often looking “up”, “sideways” and so forth. This is obvious for MLS. For DLS this happens when you are flying below the pit edge, below guard wires and so on.

Incident angle is incredibly valuable and is used for a variety of functions. One of the use cases for us is ray tracing to coincidently acquired high resolution images. This requires a resolution of each component of around 24 bits. The more common use is for visualization of the point cloud (for example, which points on a road sign to display based on eye point). This use case could probably work fine with 6 to 8 bits of resolution.

For numbers normalized on the span -1.0 to 1.0 that have relatively uniform distribution and required resolutions in the 16 to 32 bit range, “Float” is the ideal structure (not to mention it is now the ubiquitous data type in hardware renderers). So float/doubles are not necessarily bad words (Note that if you normalize a project to -1.0 --> 1.0, double can have some advantages as the representation and hence its inclusion in the proposed LAS 2.0 standard). I think most folks are aware of the logarithmic loss of resolution in these data types when you move out beyond the base range so we would not use them in denormalized work (in my data sample of extra byes from a DLS application, Range is a float but this is actually a hardware value from the sensor whose original data type is float).

However, my point really isn’t about the ideal data types for storage in LAS. It is really about how we often view components of LAS from the narrow perspective of the use cases with which we deal – e.g. Martin’s assumption of the ALS use case in our discussion. We all tend to do this so we just need to be aware.

Of course, when we deal with augmentation such as we do through Extra Bytes, we do not necessarily have to please everyone! But if a data tuple has high value in more than one domain, we should probably give a lot of thought to those other domains (as is the case for Incident unit vector).

A second consideration of LAS is the following: LAS was initially intended as a transport format (I know because I was a member of the initial design effort of LAS – prior to it being handing to ASPRS). At that time, Terrasolid had worked with Earth Data to design the Earth Data Binary (EBN) format and tended to use it internally. Army Corps was using a precursor of LAS from Enerquest. Optech had yet another format (though Optech was not part of the original LAS consortium – they came in a bit later). So LAS was an effort to have a neutral exchange format. None of us considered LAS an exploitation format (that is, a format we would use internally in a software application) – if we had, we never would have released anything since everyone would have radically different ideas of what goes into an exploitation format (trees, tiling, ….).

However, many, many software implementations elected to use LAS directly in exploitation (in retrospect, we could have prevented this by using structures not amenable to random access!). So now I think the LAS committee has an obligation to at least consider the ramifications of direct exploitation. This has huge implications, the most serious being that a point needs to stand on its own, as much as possible. This is not the case in LAS today because one has to have information from the LAS header to scale and translate positional coordinates. In a typical project where you have many contributing LAS files, this causes a really nasty bit of bookkeeping. Consider merging N files from ALS with M files from MLS and a few files from DLS where each contractor scaled the points differently (of course, most are not aware of this since software algorithms tend to compute these values rather than allow input from a user). To handle this efficiently, you have to prescan and figure out essentially the lowest common denominator. Obviously we all do this routinely but it is still big issue. We have run into cases where denormailzing a project without resolution loss was not possible.

Now consider complexity of introducing another parameter in LAS with scanning and offsets definable to different values than the overall point scale/offset. This then begins to blow up the normalization problem.

My general point here is that we need to, as much as possible, have LAS normalization values at the point record level (and since this blows up the point record size, it means we’ll use denormal data). Arbitrary normalization parameters work just fine per file but are a huge issue in heterogeneous projects. I can see this really going wild with the use of Extra Byes. Obviously some stuff has to be at the file level such as Spatial Reference System but don’t make me go to a header to normalize/denormailze some other parameter.

A second consideration is compression and when we talk LAS, we mean LAZ. LAZ is great – we need to make certain that mods to LAS via new versions, extra bytes and so forth does not break LAZ. However, we need to really think about the philosophy of transport versus exploitation. I (and I may be in the minority here) consider LAZ to be a transport/storage format because (to my knowledge, anyway) LAZ does not support random access. So why is this important? Well, LAS should always be optimized for exploitation since so much software uses it as such. This means LAS should be based on the data types most natural (e.g. fastest) for exploitation. If data conversion is needed for efficient compression, then this should be a function of the serialize/deserialize to/from LAZ. Of course, the big hassle with this scheme is maintaining LAZ as 100% lossless (e.g. quantizing an angle in float to a 16 bit representation, etc.).

Well, this note is definitely rambling on so a repeat of the original plea:

Consider all use cases when modifying, adding to LAS
Do not add any more data to a LAS point record that requires reference outside of the point record to use.

rapidlasso commented 4 years ago

I would appreciate if @lgraham-geocue could stop suggesting my comments are driven by me only "knowing about ALS" or me only "caring about LAZ" or the like. This is getting old.

In the seminal paper "Geometry Compression" from SIGGRAPH 95 Deering kick-started research on better representations of surface normals or unit vectors noting that "Traditionally 96-bit normals (three 32-bit IEEE floating-point numbers) are used in calculations to determine 8-bit color intensities. 96 bits of information theoretically could be used to represent 2 to the power of 96 different normals spread evenly over the surface of a unit sphere. This is a normal every 2 to the power of -46 radians in any direction. Such angles are so exact that spreading out angles evenly in every direction from earth you could point out any rock on Mars with sub-centimeter accuracy."

The summary paper I cited earlier points out that "Consider a straight forward representation of points on the unit sphere. A structure comprising three 32-bit ﬂoating scalars (struct { float x, y, z; }) occupies 3 ﬂoats = 96 bits per unit vector. This representation spans the full 3D real space, R3, distributing precision approximately exponentially away from the origin until it jumps to inﬁnity. Since almost all representable points in this representation are not on the unit sphere, almost all 96-bit patterns are useless for representing unit vectors. Thus, a huge number of patterns have been wasted for our purpose, and there is an opportunity to achieve the same set of representable vectors using fewer bits, or to increase effective precision at the same number of bits."

So I am just one of pretty much any other geometry storage researcher in the world that would say that it's time to move past storing three floats for unit vectors or surface normals.

For every "additional attribute" stored as extra bytes we specify these things in the VLR:

starting byte
data type
no data value
scale
offset

@lgraham-geocue, are you suggesting that reading 1 to 3 is ok but using 4 and 5 is too complex?

lgraham-geocue commented 4 years ago

I am not all that concerned with the format used to store attributes so long as it supports the use case, is easily understandable by implementors and is not excessively bloated.

On the Comment:

For "additional attributes" stored as extra bytes we specify these things in the VLR:

starting byte
data type
no data value
scale
offset

You are suggesting that reading 1 to 3 is ok but using 4 and 5 is too complex? Yes. I think that 1-3 are likely identical for a particular attribute so no issue. However, suppose we have 20 files with 20 different scale, offset values for the same attribute (e.g. range from point to sensor). You cannot denormalize a point using only the point record – you have to read the VLR. But we do not have a really good, reliable method to go from a point to the containing file to select the appropriate VLR. This makes coders implement Extra Bytes to be able to decode the Extra Bytes. This is for issues such as keeping a cache of normalized points on a display list and only denormalizing on the way to the renderer (hence my comment about LAS being used as an exploitation format).

Perhaps this should be a change to consider – a standard way to point to a reduced set of records from the point data record. Sadly, the only thing that comes to mind is a GUID – way too expensive. Anyway, food for thought.

rapidlasso commented 4 years ago

I recently published a little blog post on how to map the information stored in these kind of ASCII lines of LiDAR information to the LAS format:

1, 290.243, 28.663, -11.787, 0.060, -0.052, 0.997, 517.3170, -58.6934, 313.0817, 52
1, 290.208, 28.203, -11.825, 0.062, -0.056, 0.996, 517.3167, -58.6934, 313.0817, 49
1, 290.182, 27.739, -11.852, 0.063, -0.055, 0.997, 517.3164, -58.6935, 313.0817, 53
1, 290.165, 27.272, -11.866, 0.061, -0.058, 0.996, 517.3161, -58.6935, 313.0817, 53
1, 290.163, 26.800, -11.858, 0.061, -0.053, 0.997, 517.3157, -58.6935, 313.0817, 68
...

The first number is either a classification into ground, vegetation, or other surface, or represents an identifier for a planar shape that the return is part of. The next three numbers are the x, y, and z coordinate of the LiDAR point in some local coordinate system. The next three numbers are the x, y, and z coordinates of an estimated surface normal. The next three numbers are the x, y, and z coordinates of the sensor position in the same coordinate system. The last number is the intensity of the LiDAR return.

parrishOSU commented 4 years ago

The cBLUE topo-bathy lidar TPU tool (https://github.com/noaa-rsd/cBLUE.github.io) is currently storing vertical uncertainty values in extra bytes as floats, rather than uchar, for increased precision. This differs from the LWG's DRAFT Standard ExtraByte Definitions but seems to be working for those groups using the tool. Input on this? @forkozi @esilvia ?

rapidlasso commented 4 years ago

Do the TPU values require error values that are in increments of femtometer close to zero but drop exponentially to increments of decimeters close to one million? Then a float32 representation is suitable.

If the increments with which the error is to be expressed should be a constant centimeter or millimeter throughout the entire error value range then an unsigned integer scaled by 0.01 or 0.001 is the correct approach.

parrishOSU commented 4 years ago

The range of plausible vertical uncertainty values, considering the range of possible data sources, is probably meters to millimeters. Adding one order of magnitude in either direction gives tens of meters to tenths of millimeters. If we use a scale factor, where is the scale factor stored? Is it the same as for the X Y Z coordinates?

rapidlasso commented 4 years ago

The "scale factor" is core part of the "extra byte" definition. I recently published a little blog post on how to use txt2las (which is open source) to map the information stored in these kind of ASCII lines of LiDAR information to the LAS format and you see examples with different numbers of decimal digits being used there:

1, 290.243, 28.663, -11.787, 0.060, -0.052, 0.997, 517.3170, -58.6934, 313.0817, 52 1, 290.208, 28.203, -11.825, 0.062, -0.056, 0.996, 517.3167, -58.6934, 313.0817, 49 1, 290.182, 27.739, -11.852, 0.063, -0.055, 0.997, 517.3164, -58.6935, 313.0817, 53 1, 290.165, 27.272, -11.866, 0.061, -0.058, 0.996, 517.3161, -58.6935, 313.0817, 53 1, 290.163, 26.800, -11.858, 0.061, -0.053, 0.997, 517.3157, -58.6935, 313.0817, 68 ...

rapidlasso commented 3 years ago

The "beam ID" seems a rather easy first candidate for standardization. Clearly there is a need and clearly users already store this information to "extra bytes" like here as "Velodyne Rings". In this blog post I describe how to copy the beam ID from the "point source ID" field or from the "user data" field into a new "extra bytes" attribute with two calls to las2las, namely

las2las ^
-i Samara\Drone\00_raw_aligned\*.laz ^
-add_attribute 1 "laser beam ID" "which beam ranged this return" ^
-odir Samara\Drone\00_raw_temp -olaz

las2las ^
-i Samara\Drone\00_raw_temp\*.laz ^
-copy_user_data_into_attribute 0 ^
-set_user_data 0 ^
-set_point_source 0 ^
-odir Samara\Drone\00_raw_ready -olaz

jo-chemla commented 10 months ago

Hi there, checking in to see if there is now a ~standardized way to store surface normals within a las/laz file extra bytes fields.

There have been discussions on this thread regarding the way to store normals (3 floats xyz coords vs more efficient way to store a unit vector), but nothing regarding a commonly-agreed way to name these attributes eg NORMAL_X, normal z etc. A source from geocue indicates that Agisoft Metashape, Riegl and TerraScan all support exporting surface normals to las, but probably each with its own field name (could not check). Are there other places where this discussion could happen? The topic does not seem heavily discussed on laserscanningforum for example.

PS I read the type 22 tuples and triples has been deprecated so the surface normal coordinates should be stored as individual field. PDAL defined the NormalX, NormalY, NormalZ fields, see dimensions, so this could be used as the standard definition for normal coords.

jo-chemla commented 10 months ago

After digging through a bit, here is what I found for normals naming convention for las files:

Lib/Software	Normals Naming convention	Reference
PDAL main convention	`NormalX`	PDAL dimensions
PDAL other supported	any variation of `nx, normal_x, normalx`	Dimension.json or official docs
Agisoft Metashape	`normal x` with a space	`attribute0 -0.992126 0.992126 ('normal x')` using `LASTools/lasinfo`
Other like Riegel, Terrascan	No las sample found

PS: if a dedicated github issue should be opened specifically for the normal coords naming convention, please do tell me!

jedfrechette commented 10 months ago

The normals naming convention that we've adopted internally is NormalX. The reasoning for that choice was:

It's what PDAL uses.
After normalizing for case it is the same as what the E57 Normals extension uses: http://www.libe57.org/E57_EXT_surface_normals.txt
It does not use any special characters, i.e. spaces.
It is more explicit than abbreviations like nx

jo-chemla commented 10 months ago

Thanks for clarifying, I did not know NormalX was the convention adopted within the LAS spec. Great to hear!

jedfrechette commented 10 months ago

Sorry, just to clarify when I wrote "we've adopted..." I wasn't commenting on behalf of ASPRS or the LAS committee. That's just the convention we've ended up adopting within my company. For us that convention isn't specific to LAS either, but we needed a general convention to follow for data formats/applications to don't have their own definition.

jo-chemla commented 10 months ago

Woops sorry about the confusion and thanks for the contribution! Awaiting feedback from the LAS committee then.

hobu commented 10 months ago

Hi there, checking in to see if there is now a ~standardized way to store surface normals within a las/laz file extra bytes fields.

As you're finding out in PDAL https://github.com/PDAL/PDAL/issues/4144 and maybe other places, I don't think this exists. We chose the NormalX composition because PDAL uses strings to define dimension types and it seemed both explicit enough and clean enough without spaces.

abellgithub commented 10 months ago

I think most of the names came from various data files that we encouraged in the wild.

On Thu, Aug 31, 2023, 8:57 AM Howard Butler @.***> wrote:

Hi there, checking in to see if there is now a ~standardized way to store surface normals within a las/laz file extra bytes fields.

As you're finding out in PDAL PDAL/PDAL#4144 https://github.com/PDAL/PDAL/issues/4144 and maybe other places, I don't think this exists. We chose the NormalX composition because PDAL uses strings to define dimension types and it seemed both explicit enough and clean enough without spaces.

— Reply to this email directly, view it on GitHub https://github.com/ASPRSorg/LAS/issues/37#issuecomment-1700991201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKBMMHN5X4FTK4P7AE7XFLXYCC3TANCNFSM4D36PXTA . You are receiving this because you commented.Message ID: @.***>

esilvia commented 10 months ago

A while ago I became aware of a single-field convention being used in TerraSolid for this purpose... if I remember correctly they had packed the normal_x/y/z values into a 64bit or 32bit integer ExtraByte. It'll take me a while to dig that out of my notes if it's of interest.

Beyond that, the waveform PDRFs do include a directional vector that I've seen leveraged as a surface normal, but I wouldn't recommend it.

esilvia commented 10 months ago

Just found my notes. In TerraSolid they store the Normal Vector ExtraByte as a 32bit integer -- 2 bits for the "model type" and then 15 bits for each polar angle. I don't have more detailed notes than that, but maybe someone has a contact at TerraSolid that can provide more detail?

esilvia commented 10 months ago

One final comment... I found an offline discussion thread between me, Martin, and the folks at AgiSoft about a method of encoding that may resemble their final implementation of normal vectors. I'm uploading it here now that we have a more public way to have these discussions. Quantum Spatial Mail - tuples in LAS extra bytes.pdf

Beginning a new discussion thread on normal vectors may be helpful for posterity.

agisoft-metashape commented 10 months ago

Hello Evon, thanks for sharing our discussion.

We decided to use lower case name with space for normal vector extrabytes because this convention was used in other places we could find: https://github.com/ASPRSorg/LAS/wiki/Standard-ExtraByte-Definitions https://github.com/ASPRSorg/LAS/issues/37#issuecomment-418883987

It seems a good idea to keep consistent convention for extrabyte names. LAS 1.4 R15 specification includes several examples of extrabyte names using lower case naming convention with spaces. It also has a link to official LAS wiki with Standard ExtraByte Definitions list that has even more examples, all using the same lower case naming convention. It makes me believe that it is an accepted naming convention for LAS extrabytes.

If lower case naming convention with spaces is indeed preferred for LAS extrabytes, then maybe adding support for this convention in PDAL is a better solution.

Please correct me if I am wrong and camel case convention should be preferred instead (does it need to be reflected in the LAS specification then?), or if there are any good reasons to ignore consistency in this case.

Dmitry

hobu commented 10 months ago

If lower case naming convention with spaces is indeed preferred for LAS extrabytes, then maybe adding support for this convention in PDAL is a better solution.

PDAL supports names in spaces, but since it supports so many other formats, it discourages them and normalizes them out.

Please correct me if I am wrong and camel case convention should be preferred instead (does it need to be reflected in the LAS specification then?), or if there are any good reasons to ignore consistency in this case.

The specification doesn't say and I think it is up to implementers to do what they want. That said, commonly agreed upon dimension names that mean specific things (like nx) would be a welcome community resource. We have https://github.com/PDAL/PDAL/blob/master/pdal/Dimension.json for PDAL but it is just a list more than an attempt at getting anyone to agree on name + meaning.

esilvia commented 10 months ago

Thanks Dmitry @agisoft-metashape !

I've been following the lowercase-with-spaces-allowed convention for all of the standardized ExtraBytes in the wiki as a way of normalizing them, and I intend to continue that trend unless I hear a compelling reason to abandon it.

Nevertheless, as @hobu notes the specification itself is silent on a preferred naming convention.

agisoft-metashape commented 10 months ago

Thank you @esilvia and @hobu for your comments.

We are going to add support for PDAL style names ('NormalX', etc) on import in the next Metashape version. For export we will probably keep using our current lower case names until there is a consensus on the better solution.

@hobu PDAL 2.3.0 indeed normalized 'normal x' name to 'NormalX' on input, but that is not the case with PDAL 2.5.5 any more. If such normalization is indeed intended, maybe it can be added to PDAL again? This will probably solve reported compatibility issue.

ASPRSorg / LAS

Standardization of common extrabytes #37