DavidWells / analytics

Lightweight analytics abstraction layer for tracking page views, custom events, & identifying visitors
https://getanalytics.io
MIT License
2.42k stars 244 forks source link

Chain plugins to modify payload in sequence #24

Open JuroOravec opened 4 years ago

JuroOravec commented 4 years ago

Hi, is it possible to chain multiple plugins to modify the payload so that changes from one plugin are passed to another?

Example: The setup is like this:

Analytics({
    ...
    plugins: [
        // Plugins that disable tracking on certain conditions
        doNotTrackPlugin(),
        ...
        // plugins that modify the payload
        enrichAnalyticsPlugin(),
        dropNoValuePropertiesPlugin(),
        snakecasePropertiesPlugin(),
        // 3rd party analytics
        googleAnalyticsPlugin({ ... })
    ],
    ...
}

where each of the plugins that modifies the payload changes only a single specific part of it. E.g. this is a plugin which converts the property keys in payload.properties from camelCase to snake_case:

export function snakecasePropertiesPlugin() {
  return {
    name: "snakecase-properties",
    track: ({ payload }) => {
      return {
        ...payload,
        properties: snakeCaseProperties(payload.properties)
      };
    },
    identify: ({ payload }) => {
      return {
        ...payload,
        traits: snakeCaseProperties(payload.traits)
      };
    },
    page: ({ payload }) => {
      return {
        ...payload,
        properties: snakeCaseProperties(payload.properties)
      };
    }
  };

  function snakeCaseProperties(obj) {
    return Object.keys(obj).reduce((agg, key) => {
      agg[_.snakeCase(key)] = obj[key];
      return agg;
    }, {});
  }
}

The assumption is that the plugins that modify the payload should be run for all data that is passed to the Analytics instance, hence I did not want to use the namespacing as described in the docs.

But what this leads to is that instead of the data being passed from plugin to plugin sequentially, they are all fed with the initial data. So when there are multiple plugins modifying the payload before it is sent to the 3rd party analytics, those changes are just overwriting one another when the new returned value is merged with result from previous plugin, instead of those changes compounding.

Is this use case supported?

I assume (haven't tested yet) that in this scenario, it could be solved by namespacing all the modifying plugins? (e.g. so that all their functions are namespaced, e.g. 'track:google-analytics'). But if all these changes should be applied to multiple 3rd party analytics, it would quickly get inconvenient as the same functions would have to be namespaced for all of them? (So each plugin would have to have not only 'track:google-analytics', but also 'track:hubspot', for example).

Additional issue is that with my approach, the order is important (e.g. payload should be first enriched, and only then the value-less properties should be dropped). Could that be achieved with this approach? Or would I have to specify for each plugin the namespace of the plugin that should go after it? (So that in the example above, enrichAnalyticsPlugin needs to be namespaced to dropNoValuePropertiesPlugin, that in turn namespaced to snakecasePropertiesPlugin, and so on, so they are all sequentially run?). The downside of that approach is that I cannot just reorder the plugins easily, but I would have to rename the namespaces for all affected plugins.

Or is there maybe a plugin that could be given a list of other plugins, and subplugins would modify the payload sequentially? Or will I have to squash the plugins into a big one and namespace it to the desired 3rd party analytics plugin?

What's the best course of action here? Thanks.

JuroOravec commented 4 years ago

This is my workaround:

function chainPlugins(plugins: any[]) {
  if (!plugins) {
    throw TypeError("chainPlugin requires a list of plugins.");
  }
  const chainedPlugins = plugins
    .map((plugin, index, arr) => {
      const isLastPlugin = index === arr.length - 1;
      if (isLastPlugin) {
        return plugin;
      }
      const nextPlugin = arr[index + 1];
      return namespacePluginHooks(plugin, nextPlugin.name);
    })
    // Flatten results from namespacePluginHooks
    .reduce((acc, val) => acc.concat(val), []);
  return chainedPlugins;
}

function namespacePluginHooks(
  plugins: object | object[],
  namespaces: string | string[],
  hooks?: string | string[]
) {
  if (!plugins) {
    throw TypeError(
      "makeChainedPlugin requires a single or a list of plugins."
    );
  }
  const pluginsArr = Array.isArray(plugins) ? plugins : [plugins];
  if (!namespaces) {
    throw TypeError(
      "makeChainedPlugin requires a single string or a list of strings as " +
        "namespaces."
    );
  }
  const namespaceArr = Array.isArray(namespaces) ? namespaces : [namespaces];
  const hooks_ = hooks ? hooks : ["track", "identify", "page"];
  const hooksArr = Array.isArray(hooks_) ? hooks_ : [hooks_];
  // For each plugin, make a namespaced copy of each hook for each namespace
  // So for namespaces ["ga", "hubspot"], and hooks ["track", "page"],
  // assigns these namespaced hooks to the current plugin:
  // ["track:ga", "track:hubspot", "page:ga", "page:hubspot"]
  return pluginsArr.map(plugin => {
    const namespacedPlugin = namespaceArr.reduce((nsPluginOuter, namespace) => {
      return hooksArr.reduce((nsPluginInner, hook) => {
        const namespacedKey = `${hook}:${namespace}`;
        nsPluginInner[namespacedKey] = plugin[hook];
        return nsPluginInner;
      }, nsPluginOuter);
    }, {});
    return {
      ...plugin,
      ...namespacedPlugin
    };
  });
}

chainPlugins accepts a list of plugins. It namespaces their hooks so that the plugins' hooks are called sequentially.

namespacePluginHooks accepts plugin(s), namespace(s) and (optionally) hook(s). It duplicates those hooks within those plugins to the specified namespaces.

The two functions can then be used as follows to create a pipeline of plugins:

Analytics({
    ...
    plugins: [
        // Plugins that disable tracking on certain conditions
        doNotTrackPlugin(),
        ...
        // plugins that modify the payload
        ...chainPlugins([
            enrichAnalyticsPlugin(),
            dropNoValuePropertiesPlugin(),
            namespacePluginHooks(
                snakecasePropertiesPlugin(),
                "google-analytics"
            )
        ]),
        // 3rd party analytics
        googleAnalyticsPlugin({ ... })
    ],
    ...
}

In the example above, enrichAnalyticsPlugin is namespaced to run before dropNoValuePropertiesPlugin which is namespaced to run before snakecasePropertiesPlugin. Because snakecasePropertiesPlugin is last in the list, it's not automatically namespaced, so we use namespacePluginHooks(plugin, namespace) to manually namespace it to run before googleAnalyticsPlugin.

DavidWells commented 4 years ago

This is an interesting use case. Let me see if I get this correct:

You want to alter the payload for all events passing into a downstream analytics provider (in this case google analytics). Is this correct?

There is the trackStart event that runs before the track function is called in the various plugins. In trackStart, you can alter the payload and that should propagate down the chain and all track calls will have the modified values from various plugins.

See this example just added https://github.com/DavidWells/analytics/blob/13ef6c531cd7378fcf119a6b27d818e9a6226dd8/examples/demo/src/utils/analytics/example-5.js

Plugin A/B/C all modify the tracking payload that plugin D uses with all the modifications.

The payload in plugin D contains:

{foo: "bar", addOne: "hello", addTwo: "there", addThree: "now"}
JuroOravec commented 4 years ago

Hi David,

I was thinking of a more granular/modular payload modification.

Here’s graphical explanation of the modularity I had in mind. In this example, plugins C, F and G are 3rd party analytics and plugins A, B, D, E define some common transformations.

*data passed to analytics.track*
  |
  A => pluginA modifies every payload passed to track method using trackStart
  |
  B => pluginB modifies every payload passed to track method using trackStart
  |
  |\
  | \
  |  \
  C   |  => payload modified with A and B is passed to 3rd party analytics pluginC
      |
      D  => payload modified with A and B is modified with pluginD
      |
      E  => payload modified with A, B and D  is modified with pluginE
      |
      |\
      | \
      |  \
      |   G => payload modified with A, B, D and E is passed to 3rd party analytics pluginG
      |
      F     => payload modified with A, B, D and E is passed to 3rd party analytics pluginF

Plugins A and B modify all payloads (as you mentioned).

Functionality I was going for is that D and E modify only payloads that are sent to F or G, while the modifications from D and E are applied one after another.

JuroOravec commented 4 years ago

Also, for completeness, the workaround I've posted before didn't work as expected, so this behaviour was ultimately achieved by two functions, one which composed multiple plugins objects into a single "pipeline" plugin object, and one which scoped plugin's methods to specified plugin names.

Definitions ```js export function namespacePluginHooks( plugins: object | object[], namespaces: string | string[], hooks?: string | string[] ) { if (!plugins) { throw TypeError( "namespacePluginHooks requires a single or a list of plugins." ); } const pluginsArr = Array.isArray(plugins) ? plugins : [plugins]; if (!namespaces) { throw TypeError( "namespacePluginHooks requires a single string or a list of strings as " + "namespaces." ); } const namespaceArr = Array.isArray(namespaces) ? namespaces : [namespaces]; const hooks_ = hooks ? hooks : ["track", "identify", "page"]; const hooksArr = Array.isArray(hooks_) ? hooks_ : [hooks_]; // For each plugin, make a namespaced copy of each hook for each namespace // So for namespaces ["ga", "hubspot"], and hooks ["track", "page"], // assigns these namespaced hooks to the current plugin: // ["track:ga", "track:hubspot", "page:ga", "page:hubspot"] return pluginsArr.map(plugin => { const namespacedPlugin = namespaceArr.reduce((nsPluginOuter, namespace) => { return hooksArr.reduce((nsPluginInner, hook) => { const namespacedKey = `${hook}:${namespace}`; nsPluginInner[namespacedKey] = plugin[hook]; return nsPluginInner; }, nsPluginOuter); }, {}); return { ...plugin, ...namespacedPlugin }; }); } /** * Compose multiple plugins into a single plugin object * whose hooks call underlying plugins' respective hooks sequentially, * compounding the modifications to the payload object. * @param {object} options Options object * @param {string} options.name Name of the newly-composed plugin. * @param {object[]} options.plugins List of plugin objects that should be * combined together. */ export function composePlugins(options: { name: string;plugins: any[] }) { const { name, plugins } = options; if (!plugins) { throw TypeError("composePlugin requires a list of plugins."); } // Chain plugin hooks from inside out, so that the outer (upstream) hook // first processes the payload, and then passes the augmented value to the // inner (downstream) hook const compositePlugin = [...plugins] .reverse() .reduce((aggPlugin, upstreamPlugin) => { Object.keys(upstreamPlugin) .filter(key => key !== "name" && key !== "config") .forEach(key => { if (!aggPlugin[key]) { aggPlugin[key] = ({ payload }) => payload; } aggPlugin[key] = chainHooks( upstreamPlugin[key], aggPlugin[key], upstreamPlugin.config ); }); return aggPlugin; }, {}); compositePlugin.name = name; return compositePlugin; } /** * Given two analytics plugin hook functions, returns a function which wraps * them such that the arguments are first passed to the first function, which * returns the updated payload, and the arguments along with the updated * payload are passed to the second function. * @param {Function} upstreamFn Hook that will be called first * @param {Function} downstreamFn Hook that will be called with payload * property updated from upstreamFn * @param {Object} config Config that should be passed to the upstream hook. * Defaults to the config of the plugin that was first * triggered by the event. */ function chainHooks(upstreamFn, downstreamFn, config = null) { function fnInner(eventCtx, ...args) { const currEventCtx = { ...eventCtx, config: config || eventCtx.config }; const updatedEventCtx = { ...eventCtx, payload: upstreamFn.call(null, currEventCtx, ...args) }; return downstreamFn.call(null, updatedEventCtx, ...args); } return fnInner; } ```

Which could be used as follows:

Usage ```js Analytics({ plugins: [ // regular plugin doNotTrackPlugin(), // scope the composed plugin so it's applied to payload when payload is sent to GA ...namespacePluginHooks( // create a composed plugin object with name "payload-pipeline" // whose methods sequentially call methods of children plugins // and pipe the payload through them composePlugins({ name: "payload-pipeline", plugins: [ enrichAnalyticsPlugin({ store, router }), dropNoValuePropertiesPlugin(), snakecasePropertiesPlugin() ] }), ["google-analytics"] ), // 3rd party plugins googleAnalyticsPlugin({ trackingId: gaTrackingID, autoTrack: true }) ] }) ```

Ultimately, though, we didn't use this in production and instead chose to put all the modifications into a single function and scope that to GA plugin. So there's no urgency to this, it's just interesting to see if anybody had this use case before.