ASPRSorg / LAS

LAS Specification
https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
155 stars 18 forks source link

Add an optional MIME types VLR that describes other VLRs in the file #142

Open hobu opened 1 year ago

hobu commented 1 year ago

What is the issue about?

Inquiry about the specification

Issue description

Problem

I wish the LAS ecosystem had better VLR interoperability. Unless they are baked into the specification, VLRs are not really consumable without "just knowing" a particular user_id / record_id combination. Usually that's only your own VLRs, but maybe a particular application might know about one or two other application's VLRs and treat them accordingly.

One thing that is becoming increasingly needed is for storing metadata WITH the point cloud data. Sometimes that metadata is a full FGDC metadata document, sometimes it's just a Word .docx or a .pdf, or maybe it is a simple Markdown text file that describes the process of how the file was made. Regardless, the specification has no way to communicate the type of content inside a VLR. It would be really nice to be able to do this for metadata.

The NGA BPF Specification has a concept of a "Bundle File" that is a little like a VLR. The idea is to stuff whatever you want into a blob and give it a filename. The content type is implicitly defined by that filename's extension, however. There's no MIME type to explicitly tell you want that file is supposed to be. I think we could do better with LAS by providing an optional VLR that gives a simple map of user_id / record_id / mimetype / (optional) filename.

Proposal

[
    {
        "user_id":"PDAL",
        "record_id":12,
        "mimetype":"application/json",
        "description":"PDAL metadata output as a JSON document"
    },
    {
        "user_id":"USGS",
        "record_id":86,
        "mimetype":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
        "filename":"metadata.docx",
        "description":"Random stuff pasted into a word document that MIGHT describe how the data came into being"
    }
]

Notes:

  1. We should use JSON Schema to describe a schema document for these things (and any other JSON VLRs we might make).
  2. This isn't a replacement for the header.
  3. You are probably writing this at the end of the file as an EVLR since you don't know your content types until after you write them all

FAQ

Why make a new VLR instead of augmenting the current VLR headers?

Because it should be optional and we don't want to cause people to change any existing software.

Why use JSON?

It's what people use for this kind of thing nowadays. Depending on the schema, it can also be extendable so people can add their own stuff to it if they want. That said, I'm biased toward JSON as a contributing author to the GeoJSON specification, so take my suggestion accordingly 😛

hobu commented 7 months ago

Additional comments:

esilvia commented 2 months ago

Discussed in the LWG meeting today. Primary motivator is for those with large data holdings such as @kjwaters (NOAA) and @jdnimetz. I personally haven't seen many folks try to embed files like docx or xml or pdf etc into the (E)VLRs and so I don't see a lot of value. Maybe others have this problem? I'd love to get a few more opinions on the record.

There's a concern about having every LAS in an archive having the same multi-MB pdf in its header, causing potential storage bloat with limited advantage.