decaporg / decap-cms

A Git-based CMS for Static Site Generators
https://decapcms.org
MIT License
17.89k stars 3.04k forks source link

(WIP) Backend restructure - Integrations #1171

Open Benaiah opened 6 years ago

Benaiah commented 6 years ago

Integrations

What are they?

Expand... Integrations hook into a number of different places in the CMS to replace different calls, most (all?) of which would normally go to a backend. Currently the assets store (at least) requires additional work There are currently two integration providers: - Algolia: wraps searching and entry retrieval for large repositories. - Assets store: stores binary assets in a separate data store from entry data. (Currently unclear how this interacts with the media library) ### Integrations vs integration providers Ostensibly, the integration providers are implementations of generalized APIs for a named set of integration hooks: - `assetStore` - global - `listEntries` - collection-specific - `search` - collection-specific Each of these APIs indicates that certain methods will be available on the `provider` instance. There doesn't seem to be a list of these anywhere in the code, as they're implicitly required, but I've compiled one here: - `assetStore` - `upload(file, privateUpload) -> { success, url, asset }` *todo: determine the required shape of the `file` object* - `retrieve(query, page, privateUpload) -> fileList` - `delete(assetID) -> OptionPromise` (`OptionPromise` here just means that the value of the Promise doesn't appear to matter, just whether it succeeds or fails). - `listEntries`: `listEntries(collection, page) -> entryList` - `search` - `search(collections, searchTerm, page) -> entryList` - `searchBy(field, collection, query) -> entryList` The `field` here is expected to be a string, which doesn't seem to match how it's called in `src/actions/search.js/query`. There are some other methods implemented by integration providers, like `getEntry` in the Algolia provider, but they don't appear to be used currently.

How they work currently

Expand... ### Configuration The integrations are added by creating an `integrations` list in the config. This looks like the following (all currently supported configuration providers and options are listed here): ``` {.yaml} integrations: - provider: algolia # provider-specific options applicationID: example # algolia API application ID apiKey: example # algolia API key indexPrefix: example-cms # prefix for index names, which are in the form "${indexPrefix}${collection}" # hooks to attach the algolia functions to hooks: - search - listEntries # collections to apply hooks to collections: - example-entries - example-pages - provider: assetStore # provider-specific options getSignedFormURL: /example-asset-store/ # Used to generate URLs to the asset store endpoint. This is a constant, not a getter. I'm not entirely sure what this represents. shouldConfirmUpload: true # should confirm the upload by PUTing `{ state: "uploaded" }` to the asset URL # The assetStore hook is not tied to a specific collection hooks: - assetStore ``` ### Redux state init In `reducers/integrations.js`, the exported reducer updates its integrations on a `CONFIG_SUCCESS` action. Each integration is an object, with several special keys: - `provider`: integration key. - `hooks`: a Map that connects hook names to the `provider` handling them - `collections`: a list of collection names to apply hooks to, or the string `*` to apply to all collections (but not the global hooks table) Other keys are stored as provider data. The final state produced by this has two main keys: - `providers`: provider data - a Map of each integration's `provider` to a Map of any unknown keys in the integration object. - `hooks`: a Map of hooks If `collections` is *not* present on the integration object, then each hook name in `hooks` is set as a key in the `state.hooks` Map, with the provider name as the value. If `collections` *is* present on the integration object, then the config Since this is implemented by reducing over the configured list of integrations, the precedence of integration hooks is determined by the order of the `integrations` list in the config, so re-ordering the list in the config can cause errors if the precedence is important. This makes generating a config which includes integrations unsafe unless you strictly control the order of the generated `integrations` list. **Collection names can collide with hook names**, since they're both used as keys to the same Map. (e.g., a collection named `listEntries` would collide with the `listEntries` hook name). This includes the names in both the `collections` list in the integration object *and*, if any integration sets its `collections` to `*`, the collections listed in the main CMS config. This has two effects, depending on whether the integration config that sets the global hook comes before or after the integration config that sets the collection hooks for the colliding collection: - If the integration configuration with the global hook comes first, the provider name set as the value of that global hook will have the collection-specific hooks set directly on its string value. - If the integration configuring collection-specific keys for the colliding collection name comes first, the Map of collection hooks for that collection will be overwritten with the provider name of the global hook. Since the order of the integrations is dependent on the order of their listing in the `config.yml`, bugs due to this can show up in previously-working configs if that list is reordered, even without any other changes made. There are *two* ways to create hooks that apply to the whole CMS: either list `hooks` without `collections` in an integration, or set `*` as the integration's `collections` to apply collection-specific hooks to all configured collections. It's unclear what the precise roles of these different categories of hooks are. The source of the reducer described above is as follows: ### Integration provider init The actual integration providers themselves are initialized in `src/integrations/index.js`, in [`resolveIntegrations`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/integrations/index.js#L5-L18), which creates the integration providers configured when called (if the integration is included more than once ). `resolveIntegrations` is called by `getIntegrationProvider` on its first call and cached (note that `getIntegrationProvider` is declared in an IIFE, so the `integrations` declaration happens immediately when the file is loaded). This caching means that integration providers are invisible to the rest of the code, and cannot be reinitialized. The `integrations` variable here is effectively a singleton, just implemented with a closure instead of a class: `getIntegrationProvider` is called in a few places: - `src/actions/entries.js` - called in [`loadEntries`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/entries.js#L249) - `src/actions/search.js` - called in [`searchEntries`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/search.js#L115) and [`query`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/search.js#L135) - `src/valueObjects/AssetProxy.js` - called in [`createAssetProxy`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/valueObjects/AssetProxy.js#L48) - `src/actions/mediaLibrary` - called in [`loadMedia`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/mediaLibrary.js#L47) and [`deleteMedia`](https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/mediaLibrary.js#L128) Whichever of these lines run first in a particular CMS setup determines when the integrations objects are set up. For most setups this probably occurs in `loadEntries`, as this is called whenever a collection is displayed, which is the default view when logging in to the CMS. ### Hooks As described above, hooks are configured and stored as lists of strings. The strings must match both the predefined hook name and a method on the integration provider object. A hook is called via the following process (this process is not wrapped by anything else, so every site where a hook may be called must implement `selectIntegrationProxy` and `getIntegrationProvider` as described below): - `selectIntegration` is called with the following arguments: - `state`: the `integrations` list from the config - `collection`: the name of the collection (this is set to `null` for `assetStore`, the only non-collection-specific hook) - `hook`: the name of the hook It then returns, from the integration state in the Redux store, either `hooks..` or `hooks.`, depending on whether the collection name is set. The value returned from here will either be `null` or the name of an integrations provider (see the "Redux state init" section above for the shape of the `state.integration.hooks` Map). - If an integration exists, `getIntegrationProvider` is called with the following arguments: - `interationsConfig`: the `integrations` list from the config - `getToken`: an async function for retrieving a token to authenticate to the backend (used only by the assetStore integration) - `provider`: name of the integrations provider It then returns the actual provider instance (initializing the provider instances with `resolveIntegrations` as necessary). - The provider instance is called with whatever methods can be assumed to exist on the provider instance based on the API specified in the call to `selectIntegration`. The methods are listed above. An example hook call follows:

Comparison of potential integration designs

Expand... ### Integrations as is Integrations are currently a very complex API, with multiple layers of setup and configuration. Parts of the API are very general and indirected (e.g., the integration/integration provider distinction), and others are tightly coupled to specific implementations of both integration providers and backends (e.g., `getToken` or the `AssetProxy` system, both of which require explicit support from backends). This leads to a situation where the API is simultaneously so flexible in principle that it's difficult to follow or implement, while so specific in operation that it requires indirect support across wide swaths of the codebase. It's also inherently stateful - the list of providers instances is a singleton, and providers themselves are class instances which store information on the instance's properties. Finally, the integrations API has sole responsibility for some concerns, meaning they cannot be implemented by backends. Search, for instance, is either done locally or through an integration - there's no ability for a backend to implement server-side search. Adding this to the backend API as well would introduce further API duplication between the backend and integration API. *TODO: expand* ### Integrations as backend composition One potential approach would be to unify backends and integrations into a single API, allowing them to be combined with normal code. For instance, an Algolia integration could be defined as a function which wraps an existing backend and calls its functions, except for `search`, `searchBy`, `getEntry`, and `listEntries`. Benefits of this could include removing `getToken` from the backend API (currently only used for integrations), and unifying the media library integration and backend APIs. *TODO: expand* ### Integrations as middleware Redux allows intercepting actions before they hit reducers using [middleware](https://redux.js.org/advanced/middleware). This is very powerful, but it does allow unrestricted access to our Redux actions, essentially making our current action structure the public integrations API. *TODO: expand*
erquhart commented 6 years ago

Great breakdown here, love it.

Couple of thoughts:

Middleware API

The middleware option doesn't have to involve exposing raw state - we can process the middleware functions however we like. I'd expect that we'd transform the state into a shape matching our published API, allow that to travel through the middleware functions, and then transform the result from there.

The precedence problem

You make it clear that precedence matters here, as it does in almost any plugin architecture. I'm wondering if we can construct the API in such a way that a backend/integration must declare what parts of the API it handles in order to be allowed to handle those parts - e.g., Algolia, or the CMS config, declares that it handles search requests, and is therefore given the ability to handle those requests.

This allows us to statically determine handlers for each action and where overrides occur.

This could easily be an enhancement for later.

Benaiah commented 6 years ago

@erquhart good point on middleware - if we do use that kind of design, we'd definitely want an abstraction layer.

knpwrs commented 6 years ago

Is there a PR open for this anywhere? Is there anything the community can do to help with this?

Benaiah commented 6 years ago

@knpwrs currently I'm working on getting the GitLab and BitBucket backends wrapped up, as well as working on some of the preliminary refactoring that'll be required before we move to a new backend API. You can follow along with that work here: https://github.com/netlify/netlify-cms/pull/517 (GitLab) and https://github.com/netlify/netlify-cms/pull/525 (BitBucket).

Once that work is done, the last step to prepare for combining integrations and backends will be to move the media library integrations into backend.js, so the core code will be calling a single endpoint for all backend and integration functionality. At that point, we can replace the backend API with a compatible version that includes integrations without affecting anything above src/backends/backend.js.

As for the new backend API itself, it's still very much in the design phase. The primary issue for the backend API design is here: https://github.com/netlify/netlify-cms/issues/1134. The best way to help out with that now is to add suggestions and critiques to that issue. As a quick intro, the core ideas of the backend restructure are as follows:

erquhart commented 6 years ago

@Benaiah for when you dig into this:

jnthnclrk commented 6 years ago

Is this released?

erquhart commented 6 years ago

No, this is currently in proposal stage.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

theetrain commented 4 years ago

Hello, is there a status update on this? I've been following #432 searching for a way to integrate NetlifyCMS with another media store such as S3 or GCP Storage. If there already exists an API (such as the one proposed in https://github.com/netlify/netlify-cms/pull/1602#issue-207764837) I'd be happy to help contribute documentation and leverage Netlify Functions if a backend is needed.