Extend/adapt LAS to support new technologies

ASPRSorg / LAS

LAS Specification

https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html

140 stars 16 forks source link

Extend/adapt LAS to support new technologies #35

Closed esilvia closed 5 years ago

esilvia commented 6 years ago

Harris has expressed concern that LAS doesn't adequately support Geiger-Mode (GM) lidar. Is there a way to enhance/revise LAS to support GM lidar?

esilvia commented 5 years ago

During the pull request from @kdamkjer we revised the language to incorporate technologies other than Geiger-mode lidar (aka GmAPD) that have similar difficulties with the specification as written, such as photo-derived point clouds (aka Structure-from-Motion or SfM; aka "phodar"). These changes were supported by @rapidlasso, @anayegandhi, and @hobu.

A few nitpicks...

https://github.com/ASPRSorg/LAS/commit/b974f00d0a40edf3d3c652820986fa2cca0a5312#diff-1b13b677118f70cd1ff6f761084f2f94R115

When the aggregate bit is set (see Global Encoding), the File Source ID is set to zero (0) since the file is generated from an aggregation of many independent sources.

I think we're going to clarify the language about the FileSourceID to be less limited to flightlines for #59, so I think we can remove the Aggregate-related language for that field.

https://github.com/ASPRSorg/LAS/commit/b974f00d0a40edf3d3c652820986fa2cca0a5312#diff-1b13b677118f70cd1ff6f761084f2f94R179

If this bit is set, the point data is derived from an aggregate model such as a coincidence processing algorithm. When this bit is set, point record fields that apply only to direct detection systems may be nulled to yield compliant product.

This language feels too technical to me, especially for a newcomer software developer. What do you think about revising it to something more easily understood by a developer? Something like...

If set, the point data is derived from multiple measurements in an aggregate model rather than from a direct measurement system. Some required point attributes may be null or interpreted differently as noted in the specification.

I can make those changes if it accurately conveys the meaning you intended.

https://github.com/ASPRSorg/LAS/commit/b974f00d0a40edf3d3c652820986fa2cca0a5312#diff-39a9e8714fe4cad3b88d48b7c4a3fee6R117

...direction or the mirror facet changes. Note that this field has no meaning for 360° Field of View scanners (such as Mobile LIDAR scanners) or when the aggregate bit is set on the header. In these cases, the edge of flight line should be set to zero and ignored.

I noticed your changes incorporate some of the corrections for #41, which is fine with me. Just wanted to make sure it got tagged.

https://github.com/ASPRSorg/LAS/commit/b974f00d0a40edf3d3c652820986fa2cca0a5312#diff-39a9e8714fe4cad3b88d48b7c4a3fee6R315

When the aggregate bit is set on the header, points are tagged with the GPS time for the directly observed point. However, points may not be ordered by monotonically increasing GPS time.

Can you further clarify what you mean by this? If I was trying to utilize timestamps from an aggregated dataset I wouldn't know what the timestamps actually mean. For example, are you deriving the timestamps randomly from one of the component measurements? Or is it an average? Or something else? Can I still use the timestamps to uniquely identify my point in an aggregated dataset?

I don't think I've actually said this yet, but thanks again @kdamkjer for your contribution and patience as our guinea pig while we iron out the process.

kdamkjer commented 5 years ago

@esilvia, thank you for your comments. Please see my responses below.

File Source ID implications for set Aggregate Bit

My concern with removing the language for File Source ID is that #59, as it is currently proposed, only offers a merge operation as justification for a setting of zero (0). Why not make the setting of the File Source ID explicit when the Aggregate Bit is set? It makes for a very straight-forward validation check. Is there a use case for non-zero File Source ID with aggregate models? I suppose I would be happy with language that better stated that non-zero File Source ID is not a requirement for valid formatted data. The comment by @hobu on #59 captures my thoughts pretty well:

It's there, and it's used successfully by the scenarios that need it. IMO, there's no reason to toss it, but to make its proper population a hard requirement when there are many LAS data storage scenarios where it doesn't really apply seems overly constraining.
PDRF field implications for set Aggregate Bit

I don’t necessarily object to your proposed change in language. I do think it should be expected that the language in a binary file format specification will be necessarily technical, though. I also don’t know why we should be tailoring the specification language to a novice software developer.

I'd like to suggest changing "derived from multiple measurements" to "derived from multiple observations" in your proposed revision. I do like the addition of “interpreted differently as noted in the specification”.

The original language was intended to, in part, acknowledge that many of the PDRF fields exist due to LAS’s legacy as a transport format for direct detection, aerial, LiDAR systems. In my opinion, it is unreasonable to completely restructure existing PDRFs resulting in a compatibility nightmare simply because LAS has grown beyond this legacy. I also believe it is unreasonable to introduce a new PDRF for every new LiDAR sensing modality. It seems that the most reasonable approach, for the near-term at least, is to simply acknowledge that some fields may not apply in all cases. Those cases should be flagged by an encoding bit of some sort like the Aggregate bit proposed here. When appropriate conditions are flagged, nonsense fields should be set to a predetermined value to indicate “not applicable”. The most logical option I could think of was null (0). This prevents the population and propagation of junk data in the fields, improves the performance of compression algorithms like LAZ, and enables consistent validation of properly formatted files.
Scan Angle clarification

I did indeed pull in clarification for terrestrial and mobile LiDAR as well. It should be tagged as such. I wasn’t convinced that I achieved the full clarification desired by #41. I was hoping to make a step in the right direction, though. I think there’s still a good bit of clarification needed for the scan angle and rank fields, so I’d hesitate to resolve #41 based on this change.
GPS Time implications for set Aggregate Bit

This is a tricky one for general purpose application. The way we process GmAPD LiDAR data is actually contrary to what many in industry believe. We do not fabricate points in our aggregate model. Every point in the final product point cloud starts its life as a directly measured point tagged with a GPS time. Similarly, we CAN tag information about the relative scan angle, etc. as I discussed in #41. The utility of this information for GmAPD products is probably diminished from the original use cases envisioned for application to linear LiDAR. We can still populate the field, though, and we’ve actually found that population of the GPS time is useful for many applications.

I would caution that using GPS time as a unique identifier for GmAPD data does not apply, nor would it apply to SPL data as well. Since both systems use arrays to collect, any return resulting from a single pulse may be tagged with the same GPS time.

I personally believe that GPS times should only be tied to directly observed points. Otherwise, as you point out, the field degenerates into meaningless information. For example, with SfM data there really is no single GPS time since there’s actually a GPS time for each pixel that contributes to the correspondence resulting in the estimated 3D point.

Finally, I expect there will be a question of whether we can similarly populate Point Source ID for GmAPD data using a strategy similar to GPS time. While we could do that, we would have establish a strategy for doing so and carry additional metadata through our processing to support such an operation. Prior to #59, there was an assumption on our part that Point Source ID and File Source ID were related based on the current language in the specification.

A Point Source ID of zero implies that this point originated in this file. This implies that processing software should set the Point Source ID equal to the File Source ID of the file containing this point at some time during processing.

We only format our data as LAS once product formation is complete. This means that when generating our product, there is never a File Source ID to reference for a "merge". From our perspective, our points "originated in this file" as described by the above Point Source ID definition. I think that the proposed updated language still supports this interpretation.

A Point Source ID of zero implies that this point originated in this file, such as for a Synthetic point, or is derived from multiple sources.

In this case, the points both "originated in this file" and are "derived from multiple sources".

esilvia commented 5 years ago

I just merged the latest batch of updates into the issue-35 branch here: 5e3cde7cfd2fa4800683db08d4c1e0461c2b9aae

I then made a few modifications based on your feedback and put them on the issue-35-alt branch here: 5472b33ca766e0bf809a6866d47ee3a892d2a951

Some comments:

FileSourceID: https://github.com/ASPRSorg/LAS/commit/5472b33ca766e0bf809a6866d47ee3a892d2a951#diff-78274eec4f71fb212da76973ae06491fR107

Does the alt language capture what you're looking for? I know that for photo-derived point clouds it's pretty rare to combine photos from multiple sorties because of changes in sun angles, etc. Is that also the case for GML, or do you combine data from multiple sorties in the derivation of a single point? If so, then the language I wrote might not make sense.

I wasn't sure what term, exactly, to use that captures both GML and SfM, so I used the term "aggregation model airborne systems." I welcome a better term if you've thought of one. Or maybe I should just spell them out.
Aggregate bit description: https://github.com/ASPRSorg/LAS/commit/5472b33ca766e0bf809a6866d47ee3a892d2a951#diff-78274eec4f71fb212da76973ae06491fR178

If I'm being honest my issue with the previous language is that I don't know what an "aggregate model" or a "coincidence processing algorithm" are supposed to be, so I'm assuming that the same will be true of someone years from now who's trying to export LAS from their SfM software, for example. I'm hoping that the simplified language will clue them in that it applies to them.
Scan Angle (Rank) clarification: https://github.com/ASPRSorg/LAS/commit/5472b33ca766e0bf809a6866d47ee3a892d2a951#diff-18f022aa6ba3ac410b8d1645ae3295aeR239

Your language works for me. I just made some minor editorial changes.
Intensity: https://github.com/ASPRSorg/LAS/commit/5472b33ca766e0bf809a6866d47ee3a892d2a951#diff-18f022aa6ba3ac410b8d1645ae3295aeR72

Minor tweaks to the phrasing to make the differences more obvious.
GPS Time: https://github.com/ASPRSorg/LAS/commit/5472b33ca766e0bf809a6866d47ee3a892d2a951#diff-18f022aa6ba3ac410b8d1645ae3295aeR328

If only there was a way to store multiple timestamps and/or PtSourceIDs for a given point... This could be incredibly interesting and useful information for someone trying to model line of sight. I suppose it could be done with ExtraBytes, but that falls WAY outside the scope of this standard.

I personally believe that GPS times should only be tied to directly observed points. Otherwise, as you point out, the field degenerates into meaningless information. For example, with SfM data there really is no single GPS time since there’s actually a GPS time for each pixel that contributes to the correspondence resulting in the estimated 3D point.

I think I agree with you to some extent. Although one could shoehorn in some arcane algorithm for storing a composite timestamp, I think doing so violates common sense. What if for aggregate datasets you guys encode the most recent timestamp of the component direct observations? It's better than nothing because it gives a ballpark estimate of the true measurement time(s) (for change detection) and enables ray-tracing for at least one of the component observations when combined with the trajectories. I don't think the warning about non-monotonically increasing timestamps is necessary because LAS files go out of order all the time anyway.
Point Source ID: https://github.com/ASPRSorg/LAS/commit/5472b33ca766e0bf809a6866d47ee3a892d2a951#diff-18f022aa6ba3ac410b8d1645ae3295aeR257

Regarding PointSourceID, I think your interpretation of R13 is technically correct, although I think it's confusing because the spec also says PtSrcID=0 is invalid. What if we drop the "multiple sources" clause and instead use the same system as for GPS Time? (i.e., the line or sortie number of the most recent component observation). I believe that #59 has loosened the connection between FileSourceID and PtSourceID sufficiently to allow alternative interpretations.

Thanks again @kdamkjer. If you like my edits we can merge the alt branch but I wanted it to be really obvious what changed and easy to undo if you don't like it. Looking forward to your reply. I think we're getting close.

kdamkjer commented 5 years ago

I think we are definitely getting closer, @esilvia.

File Source ID

There are almost always cases where we have data from multiple sorties contributing to a single file within a project. However, I don't think it makes sense to tag the file with a with a non-zero File Source ID even when all of the points are derived from observations within a single sortie. It is just as likely that multiple sorties contribute to a file. Also, why have a different criteria for populating this field for direct measurement systems vs aggregate systems (Flight Line ID vs. Sortie ID)?
Aggregate bit description

I think this description is fine. I agree that it is difficult to discuss Geiger-mode and Structure from Motion processing without diving into algorithm details. I think we've done about as well as we can.
Scan Angle (Rank)

Looks good to me.
Intensity

Looks good to me.
GPS Time:

I firmly disagree on the proposal to tag the product point with the most recent GPS Time from the supporting points. As I've tried to explain, every point that is added to our GmAPD products is a directly observed point. As such, each has a GPS Time associated. The support from non-product observations are used to make a determination of signal vs. noise, estimate strength of return for signal points, and refine the raw range measurement.

If someone wants line of sight vectors, there are much more efficient ways to compute that information. Just like linear-mode systems, we must have projection information to place points in the world coordinate system in the first place. For GmAPD, there is one line of sight for each point that could be captured as a triple similar to a surface normal. SfM gets a little more interesting. As you note, there is actually a line of sight for each pixel that contributes to the estimation of the 3D point.

The note about non-monotonically increasing GPS times was included make explicit that there is no requirement that points be sorted by GPS time within a file.
Point Source ID

I also disagree with this proposal. I partly disagree for the same reasons that I disagree with the proposal for GPS Time. I also partly disagree because of the reasons I mentioned previously.

We only format our data as LAS once product formation is complete. This means that when generating our product, there is never a File Source ID to reference for a "merge". From our perspective, our points "originated in this file" as described by the above Point Source ID definition. I think that the proposed updated language still supports this interpretation.

The description for this field is also extremely confusing. Is this field required to be populated with a non-zero value to pass validation or not? The description states that in the case of merges, the Point Source ID could be the File Source ID of the original file. What is it otherwise? When the Aggregate bit is set, should the point be tagged with a flight line number, or a sortie number? If a flight line number, are numbers unique per product? per sortie? per project? If a sortie number, are numbers unique per product? per platform? per project?

Given the legacy of this field to track point heritage, why not permit zero as a valid value when Aggregate bit is set? The description already appears to be self-contradicting by stating the valid range is 1--65,535 then immediately stating that zero has a special, valid, interpretation. If zero is valid, why not expand its meaning? If it's not valid, why is it given a special interpretation?

anayegandhi commented 5 years ago

Kristian – I agree with your assessment of GPS Time Stamp. If we know the time stamp of the ‘directly observed point’, then that should be the timestamp used for GPS Time.

By the same logic, I disagree with your proposal for Point Source ID. If the point is truly a ‘directly observed point’ then we know the sortie/swath it originated from. If that is the case, we should be able to tag that point to its original sortie/swath. Whether it is a swath number, flight line number, or sortie number is getting deep into the details of the sensor characteristics and flight parameters for that project. The intent of that field is to be able to determine the origination of that point. As long as we can be consistent in the definition of swath/sortie/flight number for the sensor or even project (whether GM, linear, or other), and that it is documented in the metadata, we are good.

-Amar

Amar Nayegandhi CP, CMS, GISP Vice President Geospatial and Technology Services Dewberry Engineers Inc. 1000 N. Ashley Dr., Suite 801 Tampa, FL 33602. 813.421.8642 direct 727.967.5005 mobile 813.225.1325 office 813.225.1385 fax anayegandhi@dewberry.commailto:anayegandhi@dewberry.com www.dewberry.comhttp://www.dewberry.com/

From: Kristian Damkjer [mailto:notifications@github.com] Sent: Wednesday, November 14, 2018 12:32 AM To: ASPRSorg/LAS LAS@noreply.github.com Cc: Nayegandhi, Amar anayegandhi@Dewberry.com; Mention mention@noreply.github.com Subject: Re: [ASPRSorg/LAS] Extend/adapt LAS to support new technologies (#35)

This message originated from outside your organization

I think we are definitely getting closer, @esilviahttps://github.com/esilvia.

File Source ID

There are almost always cases where we have data from multiple sorties contributing to a single file within a project. However, I don't think it makes sense to tag the file with a with a non-zero File Source ID even when all of the points are derived from observations within a single sortie. It is just as likely that multiple sorties contribute to a file. Also, why have a different criteria for populating this field for direct measurement systems vs aggregate systems (Flight Line ID vs. Sortie ID)?

Aggregate bit description

I think this description is fine. I agree that it is difficult to discuss Geiger-mode and Structure from Motion processing without diving into algorithm details. I think we've done about as well as we can.

Scan Angle (Rank)

Looks good to me.

Intensity

Looks good to me.

GPS Time:

I firmly disagree on the proposal to tag the product point with the most recent GPS Time from the supporting points. As I've tried to explain, every point that is added to our GmAPD products is a directly observed point. As such, each has a GPS Time associated. The support from non-product observations are used to make a determination of signal vs. noise, estimate strength of return for signal points, and refine the raw range measurement.

If someone wants line of sight vectors, there are much more efficient ways to compute that information. Just like linear-mode systems, we must have projection information to place points in the world coordinate system in the first place. For GmAPD, there is one line of sight for each point that could be captured as a triple similar to a surface normal. SfM gets a little more interesting. As you note, there is actually a line of sight for each pixel that contributes to the estimation of the 3D point.

The note about non-monotonically increasing GPS times was included make explicit that there is no requirement that points be sorted by GPS time within a file.

Point Source ID

I also disagree with this proposal. I partly disagree for the same reasons that I disagree with the proposal for GPS Time. I also partly disagree because of the reasons I mentioned previously.

We only format our data as LAS once product formation is complete. This means that when generating our product, there is never a File Source ID to reference for a "merge". From our perspective, our points "originated in this file" as described by the above Point Source ID definition. I think that the proposed updated language still supports this interpretation.

The description for this field is also extremely confusing. Is this field required to be populated with a non-zero value to pass validation or not? The description states that in the case of merges, the Point Source ID could be the File Source ID of the original file. What is it otherwise? When the Aggregate bit is set, should the point be tagged with a flight line number, or a sortie number? If a flight line number, are numbers unique per product? per sortie? per project? If a sortie number, are numbers unique per product? per platform? per project?

Given the legacy of this field to track point heritage, why not permit zero as a valid value when Aggregate bit is set? The description already appears to be self-contradicting by stating the valid range is 1--65,535 then immediately stating that zero has a special, valid, interpretation. If zero is valid, why not expand its meaning? If it's not valid, why is it given a special interpretation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ASPRSorg/LAS/issues/35#issuecomment-438543023, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aiu7O2Uk0rmVC8HcT7tXAUPuqKPaN6ypks5uu6q_gaJpZM4PdDmE.

Visit Dewberry’s website at www.dewberry.com

This email transmission may contain confidential or privileged information. If you receive this email message in error, notify the sender by email and delete the email without reading, copying or disclosing the email contents. The unauthorized use or dissemination of any confidential or privileged information contained in this email is prohibited. If you are not the intended recipient and intentionally intercept or forward this message to someone else, you may be subject to criminal and/or civil penalties. See 18 U.S.C. 2511 et seq.

kdamkjer commented 5 years ago

@anayegandhi and @esilvia,

Thank you both for your comments. I'm concerned about how we close on all of GmAPD, SPL, and SfM at this point. Here are the points of contention that need to be resolved as I see them:

GPS Time

GmAPD (and SPL, I believe) products contain directly observed points in the final point clouds. Thus, the points should be tagged with the GPS Time for the pulse just like linear and FWF LiDAR. While not an exclusive attribute of GmAPD and SPL, I believe these sensor types are the only ones that are currently leveraging arrays for receiving return signal. As such, they are very likely to produce data sets with duplicate GPS Times. This may be contrary to user expectations, but should not be considered invalid. Do we need to mention this in the specification? Further, the nature of GmAPD and SPL processing may result in data that is not stored in monotonically increasing order w.r.t. collection time. Again, the specification makes no assertion about point ordering, but this may be contrary to user expectations. Would a general statement that no point ordering should be assumed be sufficient to address this concern?

SfM products contain points estimated from correlated features in multiple images. This means that there can not be a single GPS time associated with each point. In this case, the data sets must either avoid populating GPS Time, or we need to establish and publish a consistent mechanism for populating the field. @esilvia has proposed using the most recent GPS Time associated with a point estimate in this case. This would lead to an interpretation of "latest feature observation" for the purposes of change detection. If needed, additional GPS Times associated with point estimates could be populated in Extra Bytes, but that decision seems to be beyond the scope of the next revision.

Perhaps there are only two cases to consider. First, when points are direct observations, they are tagged with the time associated with the observation. Second, when points are indirect observations, they are tagged with the time for the most recent observation that contributed to the estimate.

Unfortunately, GPS Time is mandatory for PDRFs 6-10, so we must establish an approach to support SfM and similar technologies. The only other thought I have at this time is to use a flag value for SfM data to indicate "Not Applicable". Perhaps NaN makes the most sense if we choose to use this approach. However, explicitly recommending NaNs introduces all kinds of complexity for correctly handling and interpreting the field.
Point Source ID

My primary concern with Point Source ID at this time is that its definition is too ambiguous. #59 and #69 attempt to relax the definition of File Source ID and Point Source ID to allow for "sources" other than files. However, the latest proposed revision makes it very hard to understand how to populate the File Source ID and Point Source ID to create a valid file. Consider the latest proposed language:

File Source ID: This field should be set to a value from 0 to 65,535. A value of zero is interpreted to mean that an ID has not been assigned, which is the norm for a LAS file resulting from a merge operation (e.g., a tile) or for a LAS file with the Aggregate bit set, since the file is generated from an aggregation of multiple independent sources.

Note that this scheme allows a project to contain up to 65,535 unique sources. Example sources can include a data repository ID or an original collection such as a flight line for direct measurement airborne systems, a sortie number for aggregation model airborne systems, a route number for mobile systems, or a setup identifier for static systems.

Based on the description of File Source ID, zero is not only valid for data when the Aggregate bit is set, it is the norm. However, the description appears to allow for non-zero settings when the Aggregate bit is set, even though it is not expected. When this occurs, the description is unclear on interpretation other than to assume that all of the data comes from a single "source", but this file may be a derivative and not the source (consider a decimation or other transformation operation instead of a merge). Are data providers required to mark files with data from a single "source" with a non-zero File Source ID, even if it's pure coincidence? If so, what constitutes a "source" and how can the metadata be validated? Why capture "source" at a different fidelity for direct measurement systems vs. aggregate models? How can data providers know that they have populated this field in a valid way?

The File Source ID description sets up significant confusion for populating the Point Source ID.

Point Source ID: This value indicates the source from which this point originated, such as a swath number for conventional airborne systems, a sortie number for aggregation model airborne systems, a route number for mobile systems, or a setup identifier for static systems. For a LAS file created during a merge operation, the Point Source ID could equal the File Source ID of the origin LAS file.

Valid values for this field are 1 to 65,535 inclusive. Zero is reserved as a convenience to system implementers. A Point Source ID of zero implies that this point originated in this file (e.g., for a Synthetic point).

When the Aggregate Global Encoding bit is set, the Point Source ID should equal the sortie or flight line number of the most recent component observation.

According to the description, the Point Source ID for delivered product can not be zero. However, zero carries the meaning that a point originated in this file. This is always the case for aggregate models. There is no parent LAS file that points originate from and the File Source ID for all of these files is expected to be zero. This means that even if the Point Source ID is set to the File Source ID as specified, the Point Source ID will still be zero.

I agree that whether the field is a sortie/flight line/swath/etc. number is getting deep into the sensor characteristics for a project. However, if the meaning of the field changes when the Aggregate bit is set, we must be clear on how to populate the field. What is the expectation/requirement for unique identification of "sources"? @anayegandhi, is your point that this decision is beyond the scope of the LAS specification? If so, should the specification mention that in some way?

Does this field carry the same interpretation for directly observed points versus indirect estimates as GPS Time?

Is the Point Source ID populated with a sortie number? If so, all data from a single flight will have the same ID. Is this sufficient fidelity for populating the field? What if multiple platforms are used to collect a project? Must sortie numbers be unique per platform or per project?

Is the Point Source ID populated with a flight line number? If so, all data from multiple scanners or looks will have the same ID. Is this sufficient fidelity for populating the field? What if multiple scanners are used to collect a line (like Riegl VQ-1560i)? What if a single scan is segmented to keep data temporally consistent for processing (like forward/aft sections of an elliptical scan or time-based segmentation of a stationary scan)? Must flight line numbers be unique per project?

How do we handle SfM data? Is each image treated as a separate source? Are they expected to be numbered uniquely per project? Does it make sense to label with the latest contributing source?

Given the above concerns, how do we clearly specify the population of this field without calling out each specific technology and collection method?

Finally, are data providers going to be required to re-process the backlog of existing data that has already been generated to add this previously unspecified level of heritage tracking? Is the value added worth the effort?

Whatever is decided, I think it is important to remember that LAS may not be the format of choice when producing data. LAS is extremely useful as an exchange format, but there should not be an expectation or implication that it is used through all steps of product formation.

Of course, a trivial solution to all of these issues is to simply allow Point Source ID to be populated with one (1) instead of zero (0) when the Aggregate bit is set, at least for the near-term. Alternatively, we could leverage a different flag value. Perhaps 65,535 since zero already has a separate interpretation?

Thoughts?

jdnimetz commented 5 years ago

@esilvia

If only there was a way to store multiple timestamps and/or PtSourceIDs for a given point...

Can we not do this? Regarding timestamp, one idea I haven't seen discussed is tagging the indirect / aggregate model point with both the first and last detections. For G-m lidar, I suppose this would be the lowest and highest timestamps for all non product points as described by Kristian @kdamkjer

from non-product observations are used to make a determination of signal vs. noise, estimate strength of return for signal points, and refine the raw range measurement.

For SfM, this would be based on all images used to derive point as images can be timestamped.

Perhaps a flag indicates an aggregate model, if null, then point is a direct measurement and has single correlated timestamp; if flag is set, there are two timestamps representing the first and last detection or image (for SfM).

My apologies if this has already been discussed and determined to be non-viable. I also realize adding a second time stamp attribute is no easy thing. However, I like this approach as it enables some temporal resolution for the data.

esilvia commented 5 years ago

Thanks @kdamkjer @jdnimetz @anayegandhi for a great discussion. I'd like to focus on GPS Time first because I think the answer for this will feed back into the PtSourceID discussion. You have convinced me there's confusion re: FileSourceID and PtSourceID in the spec as written.

Let's first establish that I'm totally okay with points having the same timestamp because they were observed at the same time from an array. Even linear sensors have this problem because a single pulse can have multiple returns, and more recent sensors have multiple channels recording at the same time that result in identical timestamps.

LAS 1.4 gave us a 2-bit Channel field. Thus each point in a conventional system has a unique ID by means of a PtSourceID, Channel, GPS Time, and Return Number, up to 15 returns and 4 channels. Some also use the UserByte for additional channels, but this is obviously not canon.

Further, if your platform has multiple sensors you can assign them unique PtSourceID ranges for each sensor. For example, we often co-acquire with two sensors on the same ship and assign one sensor PtSourceIDs in the 10,000 range and the other sensor to the 20,000 range. This helps us differentiate between sensors; even though technically they're from the same flightline they are still difference sources.

Unfortunately, this system breaks down for the Velodyne Puck32, GML, and SPL100 since they have far more than 4 receivers/channels/array-cells recording simultaneously. LAS simply doesn't have the space to maintain that uniqueness for a point without getting creative. I think that's okay. It might be worth a warning in the spec to that effect, providing a guideline for what to do when the number of channels/returns exceeds the available space. Again, this issue isn't unique to Aggregate LAS files. This need came up in the discussion on #18.

Bottom line is I'm okay with points having the same timestamp, and doing so doesn't violate the spec.

My objection is to assigning all points a null value. We can resort to that if we can't come to a consensus, but I'm not yet convinced it's necessary.

GmAPD (and SPL, I believe) products contain directly observed points in the final point clouds. Thus, the points should be tagged with the GPS Time for the pulse just like linear and FWF LiDAR.

Yes, SPL100 points in the LAS file are direct observations as far as I know. I've seen valid LAS files from a SPL100 that populate all fields in the manner expected. Only catch, as previously noted, is that the Channel field is inadequate.

If this is also the case for GML then I'm confused why we're having this conversation... direct observations are collected at a certain time from a certain place and at a certain angle. To avoid further confusion on this thread, I'm going to go over GML basics with @kdamkjer offline and report back early next week.

@jdnimetz There's only room for one GPS Time in each point record. I don't see a way around that without using ExtraBytes, and I think it's overreaching for the spec to require a particular ExtraByte for a specific technology. Regarding a point-specific Aggregate flag, unfortunately there's no room in the existing point records for this. That's going to be a non-starter for a LAS 1.4 revision.

rapidlasso commented 5 years ago

From a user (data recipient) point of view the availability of a point source ID in the LiDAR point cloud is imperative to enable / empower the user to perform an alignment check to check the quality for any LiDAR system that collects LiDAR points by direct observations (non-aggregate) that are overlapping in two or more swaths as the sensor flies, drives, or swims by the same area multiple times, either on the same sortie or during a later sortie.

Sometimes these LiDAR points have lost their point source ID. In this case we can still use the GPS time stamps to recover which LiDAR points form consecutive swaths. This is true for linear LiDAR and for SPL LiDAR. Even if the many returns of one, two, three, 16, 32, 64 laser beams or of 100 beamlets have the exact same GPS time stamp (which is what we are expecting for dual beam scanners, 16, 32, or 64 beam Velodyne systems, or beamlet / array based system) we are still able to recover which LiDAR points form consecutive swaths for the purpose of a between-swath alignment check.

jdnimetz commented 5 years ago

Regarding timestamps: @esilvia Thanks for confirming current LAS architecture is limiting in terms of storing a second time stamp or an aggregate model flag. Perhaps we can consider these for a future revision as I believe they would be valuable. In the case of G-m lidar, I understand the points are directly observed with correlated timestamps. However, these points cannot be without the multitude of supporting observations occurring from different times and capture stations. Therefore, I believe a range is more appropriate as it defines the temporal resolution for the data. That said, having a single timestamp is better than having no temporal information for the aggregate model point data.

esilvia commented 5 years ago

So after a couple of weeks of back and forth, here's a summary of what was covered between myself and @kdamkjer @jdnimetz plus all the input on this thread.

Summary

We're proposing to add an Aggregate Global Encoding bit for technologies like GML and SfM to designate that the points are derived from multiple sources rather than directly observed. The net result is that the following attributes either lose meaning or require special interpretation when the Aggregate Global Encoding bit flag is set:

Intensity (^)
Return Number (also added language for single-return sensors)
Number of Returns (also added language for single-return sensors)
Scan Direction Flag (also added language for non-rotational sensors)
Edge of Flight Line Flag
Scan Angle & Scan Angle Rank ($)
Point Source ID ($) (also further improved the connection to FileSourceID as discussed in #59 and removed confusing language about zero being both valid and invalid)
GPS Time ($)
Scanner Channel

Intensity

Note (^): We had a lot of discussions about what, exactly Intensity actually is. We concluded that the intent was for Intensity to be a measure of the signal strength of the ranging instrument – and therefore by extension the reflectivity of the target. Although Intensity isn't directly measured for GML, it can be synthesized based on the number of observations going into the consensus model as a proxy for reflectance. Similarly, the Intensity could be computed as the luminosity based on the RGB values in a SfM dataset. Language is being included to reflect this. Please let me know if it's inadequate.

PtSourceID, GPS Time, Scan Angle

Note ($): As noted by @jdnimetz these attributes could/will have multiple values for a given Aggregated point. We recommend either setting these to null or systematically assigning them based on the sources from which each point is derived. For example, one could assign the GPS Time, PointSourceID, and Scan Angle based on the most recent photo used to generate a SfM point. One could also use the oldest photo. Those aren't the only options, either; Harris has their own novel approach that involves selecting a representative "query point" from which these attributes will be derived as @kdamkjer presents here. The key message here is that if you assign a value, you should use a real value (rather than, say, computing an average) and to be consistent for a given LAS file. If retaining all of the component values is desired, ExtraBytes will be required to do this in LAS 1.4 and I welcome proposals along those lines.

Aggregate vs. Synthetic

We noted a lot of overlap between the Synthetic point flag and the Aggregate Global Encoding bit. Here's a summary of our findings on how they could/should be used:

Synthetic – Implies that the 3D point coordinates are not the result of a direct ranging measurement.
- Can be set or unset for each individual point.
- This actually lines up pretty well with the existing description of the Synthetic flag... it just hasn't historically been used very much.
- Frankly, I still don't expect it to be used much because I imagine that most people will find it easier in their software to keep Synthetic null, but it's there if people need the distinction.
Aggregate – Implies that some point attribution is derived from multiple sources and/or observations.
- It's set or unset for the entire LAS file.
- That is, attributes may have slightly different descriptions or may carry no meaning for this data.

Implications (Thanks @kdamkjer for your summary!):

Aggregate=0 & Synthetic=0 – Point coordinate is a direct measurement. Attribution is taken from the direct measurement system.
- Example: Most linear-mode lidar systems.
Aggregate=0 & Synthetic=1 – Point coordinate is an estimate. Attribution is nulled for a pure synthetic point or taken from a direct observation (e.g., a moved point).
- Example: Terrain-projected points from a linear-mode system.
Aggregate=1 & Synthetic=0 – Point coordinate is a direct measurement. Attribution is taken from the direct measurement system where possible. Otherwise, it is derived from an aggregation of multiple sources.
- Example: Harris GmAPD LiDAR.
Aggregate=1 & Synthetic=1 – Point coordinate is an estimate. Attribution is derived from an aggregation of multiple sources or set to null.
- Example: Point clouds derived from SfM.

Final words

I'll be creating a Pull Request to close this Issue shortly, which will have the final pdf link. Please do provide your comments here as you are do inclined. I believe we have something very sensible now that fulfills the needs of all.

Here's a temporary pdf link for now: https://s3.amazonaws.com/asprs-las/LAS-specification-e786fd6a274bb1b939aa53cda293c176dcc05867.pdf

anayegandhi commented 5 years ago

Great work, Evon, Kristian, and Josh! Your efforts are making the las spec much easier to understand and also allowing for the photon-sensitive and SfM data to be better incorporated into LAS. I hope to hear feedback from SfM and other photon sensitive sensors manufacturers and users so we can ensure that these changes will work for everyone.

Look forward to continued discussions on this topic. We will also be discussing these changes at the Lidar Division meeting during GeoWeek in Denver.