atomicdata-dev / atomic-server

An open source headless CMS / real-time database. Powerful table editor, full-text search, and SDKs for JS / React / Svelte.
https://atomicserver.eu
MIT License
999 stars 46 forks source link

URL paths constraints #725

Open joepio opened 2 years ago

joepio commented 2 years ago

At this point, implementing Atomic Data means you can use any type of URL / routing strategy that you'd like, as long as the URLs themselves resolve using JSON-AD accept headers to the corresponding resources.

But there are usecases where it makes sense to have semantic, more constrained URL paths / routes. For example, you may want /chatRooms/123 to always be a ChatRoom. Or maybe I want the messages there to be nested, like so: /chatRooms/123/messages/123123.

Also, when Importing resources, I may want to nest imported resources under some Importer path such as /importer/2022-01-20-auiwndandw/my-local-id.

To be clear, this type of URL usage is now entirely possible with Atomic Data and Atomic Server, but users are completely free to create a resource at /chatRooms/123/messages/my-random-resource that doesn't have anything to do with anything. Also, users can create resources like /a, which should maybe be reserved in some usecases.

We could constrain this in Atomic-Server, but ideally, I have some consistent and clear model for doing this and add it to the extended spec.

So anyway: I'm considering having some sort of constraint for using paths. Perhaps some relation with Hierarchies, too.

Possibly related: I'd like to use content hashes or commit signatures to function as identifiers for resources that don't need (or don't have) a nice human-readable URL atomicdata-dev/atomic-data-docs#113 .

Possible solutions

Slash in URL = parent should be there

If we have a resource with path /chatRooms/123/messages/abc, we could say that it needs to be a child of the resources/chatRooms, chatRooms/123, chatRooms/123/messages.

However, this will make it impossible to make changes to parents without changing the URL.

We could change the requirement to be checked only on initialization. So it is allowed to make changes to the resource without all the URLs matching, but if it's a new resource, it must check all the parents.

An additional benefit of this, is that all the parents could theoretically be retrieved concurrently from the store when checking read / write rights.

However, it will sometimes create awkward, long urls for deeply nested resources, especially if we default to generating long URLs. One way to circumvent this, is by creating URLs from user input shortnames, or incrementing integers, or short random strings.

Note that currently slashes in URLs have no semantic meaning, and they are used in base64 serialization of commits / agents. So that needs to change. (Why would anybody pick the / and + as the two extra characters for the base64 alphabet? Why?!!) Also see this rfc and this https://github.com/atomicdata-dev/atomic-data-docs/issues/117

We also have some resources which exist in sub-paths, but have no parent. For example all agents and commits. One way to deal with this, is by introducing a property that strictly allows for storing non-children on that resource, such as allowsNonChildren: true. Or maybe the back-end would simply have some hard-coded checks for some endpoints. We'd still want to prevent users from creating a non-commit at /commits/not-a-commit, for example.

Require that the parents must be stored on the same machine / domain

If we at least require that the parent(s) of a resource is stored on the same domain / server as the resource, we prevent various issues:

joepio commented 10 months ago

I've implemented the slash-parent requirement in AtomicServer some time ago. It works, but it does lead to very long URLs which just are a little unpractical.

I think I want to add uuid / nanoid support, e.g /nanoid/V1StGXR8_Z5jdHi6B-myT. If a URL starts with one of these white-listed paths, we simply check if they are valid and unique IDs and don't check for parent path constraints.