futil-js / contexture-core

Contexture DSL Processor
MIT License
21 stars 1 forks source link
contexture

:warning: Development has moved to the contexture monorepo: This package lives in https://github.com/smartprocure/contexture/tree/master/packages/server

contexture

The Contexture DSL (Domain Specific Language) Processor

Overview

Contexture is a tool for running the Contexture DSL, which is primarily about abstracting queries/filters and results/aggregrations. Each leaf node in a Contexture Tree can affect other leaf nodes (e.g., acting as a filter) and has results of it's own (e.g. a top N aggregation or search results) which are affected by the other nodes. Non leaf nodes describe how leaves relate to each other, e.g. as a boolean join of and/or, and Contexture is smart enough to make sure that filters are included based on their joins - e.g. two nodes ored together won't affect each other's results, but they will if they're anded together.

The canonical example of a Contexture Node is faceted search, where you have a checkbox list that is both a filter (restricts results to things checked) and an aggregation (show the top n values which can be checked). Contexture allows them to be nested in advanced searches with boolean joins like and/or/not.

Contexture takes as input the tree DSL and returns it hydrated with contextual results on it's context, and uses providers for different backing data stores (like elasticsearch and mongo) to actually run the search results. This means that Contexture typically runs on the server, but it doesn't have to - you can build providers that call APIs instead of directly hitting a database. While the Contexture DSL can be built anyway you'd like, it pairs well with the contexture-client, which leverages the generic structure and makes sure things update only when needed.

Ecosystem And Resources

Github npm Description
contexture contexture The core library that exectues the DSL to retrieve data
contexture-elasticsearch contexture-elasticsearch Elasticsearch provider for contexture
contexture-mongo contexture-mongo MongoDB provider for contexture
contexture-client contexture-client The client library that manages the DSL, allowing for hyper efficient updates running only what is exactly needed
contexture-react contexture-react React components for building contexture interfaces
contexture-export contexture-export Export searches into files or any other target
contexture-ec18-talk n/a Elasticon 2018 Talk About Contexture

Example Usage

let Contexture = require('contexture')
let provider = require('contexture-mongo')
let types = require('contexture-mongo/types')
let schemas = require('./path/to/schemas')

let process = Contexture({
  schemas,
  providers: {
    mongo: provider({
      getMongooseClient: () => mongoose,
      types,
    }),
  },
})

Then later:

await process(dsl)

or

await process(dsl, {
  debug: true,
})

Process Options

Process can handle a few options:

Option Description
debug Sends _meta as part of the response, which includes per node request records, relevant filters, and other debug info
onResult A callback which is called whenever a node finishes producing it's results, which can be used to send partial results over websockets for example

Core Concepts

Overview

Contexture will process a serialized contexture tree dsl, where each leaf node has a Schema representing what it is querying and which data Provider it uses, along with a Provider-specific Type that defines how it applies filters to other contexts and how it interacts with its Provider to get results.

Glossary

[^db]: Does not actually have to be a database - a provider could talk to an API, the file system, or even make stuff up on the fly [^checks]: These checks are above and beyond what the client specifies and are meant as last minute validation - the client is intelligent enough to not send up things without values or missing properties, but this provides an additional check in case something gets through (e.g., a terms_stats without a sort field). [^manyproviders]: If there are multiple Providers, it will default to the first one unless a provider is also specified with the schema on the data context itself

Implementation Details

Process Algorithm

For each of these steps, walk the tree in a parent-first DFS traversal, with each function optionally asynchronous by returning a promise. Along the way, intermediate data is added to contexts on an object called _meta. For each context, every type/processor combination is pulled on the fly, meaning it will use the correct local Provider and Type info even if some contexts have different schemas[^multischema]

Providers

All Provider must specify the following properties:

Additionally, a provider may expose config for it's client (e.g. hosts or request timeout for elasticsearch).

Types

All Types can implement any if the following properties. All are optional:

[^multischema]: This completely solves and obviates the need for the MultiIndexGroupProcessor on the client and handles it in much more elegant way (and in a single service call, instead of n services calls). A caveat is that it does not currently handle schemas from different providers (because filters are generated based on their context's local schema), so you can't currently mix a elasticsearch schema with a mongo schema (because it could try to call mongo with elastic search filters for example).

Schemas

Schemas are named by convention based on their filename and should be in camelCase. A schema must have one or more provider specific set of configuration properties.