deconst / deconst-docs

Documentation for the Deconst project itself.
https://deconst.horse/
6 stars 12 forks source link

Structured content IDs #237

Open smashwilson opened 8 years ago

smashwilson commented 8 years ago

I believe Content IDs are the single most confusing bit of working with Deconst today.

They look like http URLs, but don't behave like them. For example, swapping https with http or doubling a slash may lead to the same page in your browser due to redirects, but Deconst will treat them as distinct values and fail to map content correctly. This leads to situations where a human eye has been trained to see two IDs as the same when the system treats them differently.

They're also overloaded with several layers of meaning. The content store treats them entirely as opaque, unrelated strings, but the presenter relies on content IDs having some URL-like structure, to find TOC data, and to use their suffixes as paths in presented URLs! We also have a few places in the codebase that normalize trailing slashes and a few places that do not, and I can never remember which is which.

What I wish I'd done instead is dictate a strong structure to a content ID with clear parts that have well-defined responsibilities. Something like:

deconst://<base>/<internal-path>

Where <base> would be the contentIDBase assigned by a _deconst.json file and <internal-path> would be the envelope path within this content repository's content. A few concrete examples:

Using a custom deconst: scheme and a clearly invalid "domain name" should help them look less confusingly like browser-valid URLs, but keeping the URL grammar would allow us to continue to manipulate them with url.parse(). The content store can then validate the URL structure and normalize things like trailing or repeated slashes, and the presenter could map them entirely in terms of <base> values:

{
  "developer.rackspace.com": {
    "/": "rackerlabs.docs-developer-blog",
    "/docs/": "rackerlabs.docs-quickstart"
  }
}

This is a bit of a long-shot issue. Migrating every content repository and existing envelope to use the new content ID structure would be a real pain, which is mostly why I've avoided tackling this so far. I'd like to start thinking about it, though, because I believe it would make the control repository a lot less confusing to manipulate.

ktbartholomew commented 8 years ago

A pattern like deconst://... would also be very easy to search for with a regex, making template-side tasks like "link to the document with this content ID" a lot easier.