buildingSMART / OpenCDE-API

57 stars 9 forks source link

WebDAV as Documents API contract? #11

Open devonsparks opened 4 years ago

devonsparks commented 4 years ago

Forgive if this is a silly question - trying to get up to speed on the latest work here: Has the team considered use of an existing neutral file management API like WebDAV for the Document API? WebDAV gives most of what you'd expect to find in a file API (files, directories, movement commands, locking, etc). Extensions like Delta-V support revision control too. Is there too much of an impedance mismatch between WebDAV and existing industry file systems? Too much overhead to implement the full protocol?

GeorgDangl commented 4 years ago

Hi @devonsparks,

thank you for the input! So far, WebDAV has not been considered. I’m not familiar with the protocol, so I don’t yet have an opinion on this. The general approach, though, is to rely on easy to implement REST APIs, so I’m not sure if a full file transfer protocol would be an appropriate integration for the openCDE group.

devonsparks commented 4 years ago

Thanks @GeorgDangl. Is it right to think that the Documents API intends to be a minimal interface over backend document databases that supports:

  1. Asking the system for a "bucket" in which the client may upload a (possibly modified) document based on one or more metadata fields (e.g., Type, Discipline, Phase)
  2. Uploading a document to that bucket (assuming the client has appropriate permissions to do so)
  3. Later retrieving one or more documents from one or more buckets by searching over the metadata fields

If so, a few questions around mechanics to check my understanding:

  1. What should implementers do if their backend system only generates document IDs after file upload events? What should the register-file-upload property of UploadSessionCreatedResponse return? Box's API might be a simple example to test the idea here.
  2. Has the "browse" API endpoint described in (3) and shown on the later slides of the Summit Doc been fleshed out yet?
  3. Who decides what the "filing" criteria (Type, Discipline) in the select-documents endpoint is?

I appreciate the intent to simplifying the user experience, so just eager to dig in and work out the details :)

Thanks!

ykulbak commented 4 years ago

@devonsparks thank you for your suggestion.

The reason we haven't considered WebDAV for the documents API is that WebDAV, if I understand correctly, if designed for file and file system management but not for document management. Your 3rd question, "Who decides what the "filing" criteria (Type, Discipline) in the select-documents endpoint is?" touches the heart of the problem: Document management and control, in the construction industry, requires comprehensive document metadata management which is not supported by WebDAV. The document metadata problem is made even harder by the substantially different document control paradigms supported by different vendors.

The documents API, is currently designed to allow exchanging files without standardising any aspect of document metadata. Furthermore, standardising document metadata has been expressly made a "non-goal" for the initial versions of the documents API.

devonsparks commented 4 years ago

Hey @ykulbak - thanks for the notes. Makes sense. Webdav does support resource metadata through PROPFIND and PROPPATCH methods. It's not clear to me one way or the other whether they're sufficient for a majority of AECO document management workflows. Just figured I'd ask :)

Would you ever consider placing slightly stronger constraints on the Documents API? Currently the /select-documents endpoint returns an HTML document representing a UI to assist the user in document filing. This approach isn't amenable to form processing by machine as @bigdoods points out in #4, because clients can't tell what the data contract of the resulting HTML form might be. What if instead a GET /select-documents returned, say, a JSON-Schema description, where each JSON-Schema attribute matched a field in the associated form. Machines would be able to read this schema directly, including hypermedia links to related resources. Those looking for a better (human) user experience could bolt on one of the many available JSON-Schema-based form UI libraries (like uniforms) to dynamically generate the document filing form at runtime. The same schema definition could then be used by human or machines for document filing. Submitted forms could have Content-Type multipart/form-data, where the first Content Index holds the form instance data validated against the form's JSON-Schema, while the second Content Index holds the binary data of the file. Thoughts on the general approach? I'm just targeting some way to keep the filing requirements to a minimum while still ensuring the document selection can be driven programmatically. Thanks!

ykulbak commented 4 years ago

Hi @devonsparks your observation about the the current Documents API specification is absolutely correct. As currently specified, the Documents API only caters for interactive use cases where a human can navigate the websites presented by GET /select-documents or POST /upload-documents;

We believe that interactive use cases are important enough to stick with the dedicated, simplistic exchange. For this reason @bigdoods is now leading a subgroup which is working towards a draft specification for the machine-to-machine (non-interactive) use cases; We will be looking for opportunities to align both specifications once the subgroup completes its work.

Please contact @bigdoods to join the subgroup, your suggestions are interesting and I'm sure that your contribution would be greatly appreciated.