OpenFn / adaptors

The new home for OpenFn adaptors; re-usable connectors for the most common DPGs and DPI building blocks.
GNU General Public License v3.0
7 stars 8 forks source link

Build a collections adaptor #758

Closed josephjclark closed 3 weeks ago

josephjclark commented 1 month ago

Overview

Create a Collections adaptor that speaks to the Lightning Collections API.

It should probably be suffixed by the backend data store, like collections-postgres or collections-lightning or something, so that later it's easier to introduce new collection types, like collections-redis.

The collections API is a bit unusual in that it will be loaded as a second adaptor to jobs. It's a second class citizen in the job code. So everything needs to be namespaced - collections and operations and stuff. So collections.get rather than get. Otherwise there's a risk for clashing with the main adaptor namespace.

API

/**
 * Get one more values from a collection. For large collections, use each.
 * options can be a string key name (including wildcards), or an object
 * { key, created_after, created_before, inserted_before, inserted_after }
 * Throws if created AND inserted are specified
 * If no wild card or query included, the value will be written to state.data (or throw if not found)
 * If a wild card or query is included, an array of values will be written to state.data
 */
collections.get(name, query) // writes to state.data

/**
 * Upserts one or more values, as a { key, value } pair, to the named collection
 * If any errors are returned by the server, this will be thrown
 */
collections.set(name, data, options) // throws if errors

/**
 * Remove one or more values from the collection
 * Options can be a string key or a query object
 */
collections.remove(name, query, options) 

/**
 * Iterate over values in a collection which match the query
 * Query can be a wildcard string or object
 * The callback will be invoked with { key, value } on state.data
 * Or what if we do `(state, key, value)` ? because the first thing you need to do
 * is deconstruct anyway
 */
collections.each(name, query, (state) => {})

Configuration

{
  collections_key: /* a JWT */
  collections_endpoint: /* set by the worker  */
}

Note that config will be set by the worker automatically. Maybe later Lightning will take more control.

Later work

Maybe add APIs for administering a collection

josephjclark commented 1 month ago

Stu: we should go streaming first on this! The adaptor uses an async iterator with a stream under the hood (but passes full objects to the callback). It should also decode on the fly for get etc

josephjclark commented 1 month ago

Query: should it just be time series? So you get a key by id, or you get keys between two dates (or before/after one date)

Maybe allow key scanning? Pass a pattern and we'll find keys which match that name

josephjclark commented 1 month ago

Probable API:

/**
 * Get one more items from a collection. For large collections, use each
 * options can be a string key name (including wildcards), or an object
 * { key, created_after, created_before, inserted_before, inserted_after }
 * Throws if created AND inserted are specified
 * If no wild card or query included, the item will be written to state.data (or throw if not found)
 * If a wild card or query is included, an array will be written to state.data
 */
collections.get(name, query) // writes to state.data

/**
 * Upserts one or more items, as a { key, record } pair, to the named collection
 * If any errors are returned by the server, this will be thrown
 */
collections.set(name, data, options) // throws if errors

/**
 * Remove one or more items from the collection
 * Options can be a string key or a query object
 */
collections.remove(name, query, options) 

/**
 * Iterate over items in a collection which match the query
 * Query can be a wildcard string or object
 * The callback will be invoke with { key, record } on state.data
 */
collections.each(name, query, (state) => state)
josephjclark commented 1 month ago

I've remembered that in the original doc, Taylor suggested a key generator function. I've been thinking about this all morning and I actually think it's a way better solution - so I'm going to deviate from the spec on set