MultiReference and ReferenceFilter concept

mingard commented 7 years ago

Overview

I have been looking into the most effective way of introducing some flexibility into the way content creators build articles. Previously we've used a layout field concept which required a rather verbose collection schema and a hook in API. It was also very difficult to edit outside of Publish as the format was complex.

Example setup

In this setup, we are using collections to define modular parts of a page.

Other Collections

Articles (our primary)
Galleries
Competition Forms
Blog posts ... + a lot more

MultiReference concept

If we want to allow the editor to link galleries and blog posts to an article we need to add a separate Reference field for each. This is fine if we're only using a few, but things get complicated if the list gets long. On top of that, the interface in Publish gets rather messy, with a lot of rarely used Reference fields being displayed.

"modules": {
      "type": "MultiReference",
      "label": "Modules",
      "settings": {
        "collections": ["galleries", "blog_posts", "competition_forms"]
      }
    }
}

ReferenceSource

Disclaimer: This one steps into the datasource territory and I know it might seem a bit odd. I'll try my best to justify!

The current Reference field requires that the user defines the ObjectId's of the referenced documents. What it doesn't do is allow them the flexibility to Reference documents by other source filters.

In this example, we're going to create a collection called blogmodules and we're going to use it in pages. Our first page is called News and we want to include a blog module.

The blog module collection has two fields.

Title e.g. 'Recent news'
Posts (ReferenceFilter)

"posts": {
      "type": "ReferenceFilter",
      "label": "Posts",
      "settings": {
        "collection": "blog_posts",
        "filter": {
          "tags.handle": "news"
        }
      }
    }
}

Why do this at API level and not in Web?

In most situations there's really no need. Datasources do a great job of formatting queries.

This feature simply allows an document editor the flexibility to dynamically Reference content that relates to a post, without requiring a datasource to be created.

jimlambie commented 7 years ago

Can we see a more detailed explanation of how multi reference works? So far you've only made the collection property accept an array

mingard commented 7 years ago

Sure:

A single collection would just use the normal Reference field, so no examples there.

Specified collections example

"modules": {
      "type": "MultiReference",
      "label": "Modules",
      "settings": {
        "collections": ["galleries", "blog_posts", "competition_forms"]
      }
    }
}

All collections example

"modules": {
      "type": "MultiReference",
      "label": "Modules"
    }
}

Excluded collections example

"modules": {
      "type": "MultiReference",
      "label": "Modules",
      "settings": {
        "excludeCollections": ["pages", "authors"]
      }
    }
}

mingard commented 7 years ago

Rather than just inserting ObjectIds, it could accept an object.

Payload example option 1

{
  "title": "Foo",
  "modules": [
    {
      "collection": "galleries",
      "_id": "59df358f47884d2e1fda774e"
    },
    {
      "collection": "blog_posts",
      "_id": "58df358f47824d4e1fca774a"
    }
  ]
}

Payload example option 2

{
  "title": "Foo",
  "modules": [
    "galleries.59df358f47884d2e1fda774e",
    "blog_posts.58df358f47824d4e1fca774a"
  ]
}

jimlambie commented 7 years ago

Option 1 above is where I was heading with this, too. It's more readable and requires less parsing of data.

jimlambie commented 7 years ago

"elements": [
  {
    "_id": "59e21f67ae114ddab6b4d7ee",
    "uid": "1-page",
    "title": "About Us",
    "template": "text",
    "url": "/about-us",
    "apiVersion": "1.0",
    "createdAt": 1507991399299,
    "createdBy": "testClient",
    "history": [],
    "v": 1
  },
  {
    "_id": "59e227464d908ce4cdf0c8b1",
    "uid": "23333",
    "template": "video",
    "title": "Video 1",
    "apiVersion": "1.0",
    "createdAt": 1507993414076,
    "createdBy": "testClient",
    "history": [],
    "v": 1
  }
],
"composed": {
  "elements": [
    {
      "collection": "pages",
      "_id": "59e21f67ae114ddab6b4d7ee"
    },
    {
      "collection": "videos",
      "_id": "59e227464d908ce4cdf0c8b1"
    }
  ]
}

mingard commented 7 years ago

That’s my preference too. Better to avoid straying from objectids

mingard commented 7 years ago

Which field determines the collection to populate?

mingard commented 7 years ago

Also, will whichever field is required for determining the collection be required with reference updates?

jimlambie commented 7 years ago

Looks like I forgot to copy in the collection identifier when I returned the data, I can fix that

mingard commented 7 years ago

Also, will whichever field is required for determining the collection be required with reference updates?

@jimlambie what are you thinking for this?

eduardoboucas commented 6 years ago

The discussion in #395 led me here. Here are my thoughts on how Reference fields could handle all the requirements we now have, including the question raised by @mingard on how single vs. multiple values are represented.

I propose a single field type (Reference) to hold all reference values. Single reference values are returned as objects, multiples are returned as arrays. API will sanitise each object accordingly.

To insert data into a reference field, there are two different methods available.

Method 1

The collection referenced by the field must be declared in the schema, in the settings.collection property. This is backward-compatible with the current implementation of API 2.0.

collection.books.json

{
  "author": {
    "type": "Reference",
    "settings": {
      "collection": "authors"
    }
  }
}

Inserting a book with a single existing author

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": "59e227464d908ce4cdf0c8b1"
}

Inserting a book with two existing authors

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": [
    "59e227464d908ce4cdf0c8b1",
    "59e227464d908ce4cdf0c8b2"
  ]
}

Inserting a book with a single author that doesn't yet exist
```
POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": {
    "name": "James Lambie"
  }
}
```
API will create a document in the authors collection with {"name": "James Lambie"}.
Inserting a book with two authors that doesn't yet exist
```
POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": [
    {
      "name": "James Lambie"
    },
    {
      "name": "Arthur Mingard"
    }
  ]
}
```
API will create two documents in the authors collection with {"name": "James Lambie"} and {"name": "Arthur Mingard"}.

Method 2

This method does not rely on the field schema declaring the name of the referenced collection. Instead, it allows a single field to reference documents from multiple collections.

collection.books.json

{
  "author": {
    "type": "Reference"
  }
}

Inserting a book with a single existing author

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": {
    "collection": "authors",
    "data": "59e227464d908ce4cdf0c8b1"
  }
}

Inserting a book with two existing authors

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": [
    {
      "collection": "authors",
      "data": "59e227464d908ce4cdf0c8b1"
    },
    {
      "collection": "authors",
      "data": "59e227464d908ce4cdf0c8b2"
    }
  ]
}

Inserting a book with two existing authors from different collections

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": [
    {
      "collection": "authors",
      "data": "59e227464d908ce4cdf0c8b1"
    },
    {
      "collection": "nonEnglishAuthors",
      "data": "59e227464d908ce4cdf0c8b3"
    },
    {
      "collection": "authors",
      "data": "59e227464d908ce4cdf0c8b2"
    }
  ]
}

Inserting a book with a single author that doesn't yet exist

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": {
    "collection": "authors",
    "data": {
      "name": "James Lambie"
    }
  }
}

API will create a document in the authors collection with {"name": "James Lambie"}.

Inserting a book with two authors that doesn't yet exist

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": [
    {
      "collection": "authors",
      "data": {
        "name": "James Lambie"
      }
    }
    {
      "collection": "authors",
      "data": {
        "name": "Arthur Mingard"
      }
    }
  ]
}

API will create two documents in the authors collection with {"name": "James Lambie"} and {"name": "Arthur Mingard"}.

Inserting a book with two authors, from different collections, that don't yet exist

POST /1.0/test/books

{
  "title": "Building cool APIs",
  "author": [
    {
      "collection": "authors",
      "data": {
        "name": "James Lambie"
      }
    }
    {
      "collection": "nonEnglishAuthors",
      "data": {
        "name": "Eduardo Bouças"
      }
    }
  ]
}

API will create a document in the authors collection with {"name": "James Lambie"} and one in nonEnglishAuthors with {"name": "Eduardo Bouças"}.

Notes

The decision to separate the name of the collection from the pre-composed document in Method 2, into the collection and data properties respectively, is based on:

Removing the need for meta/prefixed fields

If we were to inject the name of the collection into the body of the pre-composed document (e.g. {"_collection": "authors", "name": "John Doe"}), we'd need to make sure the property where the collection is defined doesn't clash with data from the document. We could introduce a prefix, but API 3.0 introduces the concept of configurable prefix characters, where it's possible to even remove prefixes completely. This makes this option a lot more complex and prone to issues.
Easier for consumer applications

For consumer applications that are inserting data into Publish, injecting a meta property means cloning an object (or mutating it by assigning a new property, which is probably a bad idea). It's easier to just wrap the pre-composed document in a parent object with a data property.

mingard commented 6 years ago

@eduardoboucas one scenario that drove the original request was a need for multiple collections to be defined. Method 1 is a single collection and method 2 is unrestricted. Perhaps the ability to define an array of collections would be a third method. It’s more about restrictions in editing. Perhaps this could be a Publish setting, but it feels like a form of field validation to me, with an error thrown on insert fail: Field ‘authors’ must be one of xxxxxxx.

Note that it also could be important to be able to define fields on a per-collection basis, and whilst this can be something we handle in a datasource when using web, it might need to exist in other usecases.

eduardoboucas commented 6 years ago

I see the restriction on the referenced collections as a validation rule. Not limited to Publish, but part of the new field-specific validation rules that we’ve been discussing for a while (which I’m hoping to progress in the next few days).

As for limiting the fields returned from the referenced documents, I’d rather do that in the existing fields parameter for consistency, where you’d define the fields of the various levels if you don’t want to get them all. We might need to introduce a special notation here, but I think it’s still worth doing it here rather than introducing a third method.

Would that be any good?

mingard commented 6 years ago

@eduardoboucas how do you propose the validation rule be formatted. For example, how would I achieve this with validation?

{
  "author": {
    "type": "Reference",
    "settings": {
      "collections": ["authors", "people", "users"]
    }
  }
}

Regarding the fields, current this is supported

{
  "author": {
    "type": "Reference",
    "settings": {
      "collection": "authors",
      "fields": ["name", "title"]
    }
  }
}

How would this look with multiple collections?

eduardoboucas commented 6 years ago

Regarding validation, I see it being declared in a way that is very similar to what you posted, but on a validation block, where field-specific validation parameters could be added. Here's an example:

{
  "title": {
    "type": "String",
    "validation": {
      "regex": {
        "pattern": "^[0-9a-fA-F]{24}$"
      }
    },
  },
  "email": {
    "type": "Email",
    "validation": {
      "domains": ["dadi.co", "dadi.tech"]
    }
  },
  "author": {
    "type": "Reference",
    "validation": {
      "collections": ["authors", "people", "users"]
    }
  }
}

As for fields, I had no idea we had settings.fields in the current implementation. This is not what I meant (and, to be honest, I'm not sure I'm a big fan of it existing in the settings block, because you're not configuring how the field works, you're just formatting its output).

What I meant was relying on the fieldLimiters property, which is used to limit the fields sent in a response. So your example would look something like:

{
  "settings": {
    "fieldLimiters": {
      "authors.name": 1,
      "authors.title": 1
    }
  }
}

... or, ideally, with the array notation (not sure if we support this):

{
  "settings": {
    "fieldLimiters": [
      "author.name",
      "author.title"
    ]
  }
}

If you're talking about getting different fields based on the referenced collection, this is where that special notation I mentioned could come in. One option would be to do something like this:

{
  "settings": {
    "fieldLimiters": [
      "author@authors.name",
      "author@authors.title",
      "author@people.age"
    ]
  }
}

... or some variation of it. Basically what I'm saying is that, in my opinion, limiting the fields should happen using the mechanism we already have in place for limiting the fields. This would have the important side effect of allowing people to customise the returned fields on a per-request basis, using the fields URL parameter, bringing us closer to how GraphQL allows requests to define the exact schema of the data for output.

e.g. http://api.somedomain.tech/1.0/test/users?fields=["title","author.name","author.title","author@people.age"]

mingard commented 6 years ago

@eduardoboucas I like the move from the settings block to validation. For backwards compatibility, I assume we'd be keeping settings.collection for single collection Referencing.

I actually don't see the point in the field limiters at all, at least not at collection schema level. It makes sense to support them in queries, but i don't think we need to consider collection-specific field limitations in, for example, a datasource. I don't see why a multiple source Reference field would need to allow author@authors.name specifically, as the result of the payload could be filtered at template level in Web. pseudo code:

if author.results[0].data.name && author.results[0].data.collection === 'people': do x

TL;DR

like the validation. Fields are probably not important, we just need to make sure that settings.fields is ignored when no settings.collection is defined.

eduardoboucas commented 6 years ago

For backwards compatibility, I assume we'd be keeping settings.collection for single collection Referencing.

Absolutely!

I actually don't see the point in the field limiters at all, at least not at collection schema level.

It's indeed very rarely used. I think the rationale is to offer a sensible fallback for when the URL parameter is not present in the request, much like what happens with other parameters (e.g. count, cache or includeHistory). I don't think it's about allowing a field specifically, it's about having the ability to specify a sensible default response format (e.g. to keep the payload size manageable by eliminating fields that will never be required).

But the important part is that we're able to specify the fields (and override the defaults) at query level, and, like you say, any filtering can be handled downstream using data sources or similar.

dadi / api