Events that are captured while flush is pending remain queued until next flush

mattjennings commented 1 month ago

Bug description

This is perhaps the same issue as #232. I noticed this when using flushAt of 1 and flushInterval of 0, but this can happen with any values.

The issue is evident when events are captured in timeouts or async methods (as you'll see in the repro below). When an event is captured and should flush, the client will start one here. However, if another event is captured before that flush completes, that event is stuck in queue until the next flush happens, which would typically be on the next capture or shutdown if called. This is usually OK, but consider a scenario where flushInterval time passes and the process crashes. These events would be lost when I would have expected these to get sent, especially if flushInterval is 0. (Or, in my case, staring at the posthog dashboard wondering why my events weren't showing up during development)

How to reproduce

Run this in node with posthog-node@4.0.1

const { PostHog } = require('posthog-node');

const API_KEY = 'your_api_key';

const posthog = new PostHog(API_KEY, {
  host: 'https://us.i.posthog.com',
  flushAt: 1,
  flushInterval: 0,
});

posthog.debug();

// this will get flushed immediately
posthog.capture({ distinctId: '1', event: 'event_1' });

// this won't get flushed until manual flush call below
setTimeout(() => posthog.capture({ distinctId: '1', event: 'event_2' }))

// wait 9 seconds and then flush
setTimeout(() => {
  console.log('manual flush');
  posthog.flush();
}, 9000);

// keep node alive
setTimeout(() => {}, 100000);

Console output (note that event_2 is captured but not flushed until the manual flush call):

PostHog Debug capture {
  distinct_id: '1',
  event: 'event_1',
  properties: {
    '$groups': undefined,
    '$lib': 'posthog-node',
    '$lib_version': '4.0.1',
    '$geoip_disable': true
  },
  type: 'capture',
  library: 'posthog-node',
  library_version: '4.0.1',
  timestamp: '2024-08-12T19:54:09.338Z',
  uuid: '01914827-bc3b-7ec6-a3cc-0644d7aff445'
}
PostHog Debug capture {
  distinct_id: '1',
  event: 'event_2',
  properties: {
    '$groups': undefined,
    '$lib': 'posthog-node',
    '$lib_version': '4.0.1',
    '$geoip_disable': true
  },
  type: 'capture',
  library: 'posthog-node',
  library_version: '4.0.1',
  timestamp: '2024-08-12T19:54:09.377Z',
  uuid: '01914827-bc61-7f44-ae9d-612c1157e518'
}
PostHog Debug flush [
  {
    distinct_id: '1',
    event: 'event_1',
    properties: {
      '$groups': undefined,
      '$lib': 'posthog-node',
      '$lib_version': '4.0.1',
      '$geoip_disable': true
    },
    type: 'capture',
    library: 'posthog-node',
    library_version: '4.0.1',
    timestamp: '2024-08-12T19:54:09.338Z',
    uuid: '01914827-bc3b-7ec6-a3cc-0644d7aff445'
  }
]
manual flush
PostHog Debug flush [
  {
    distinct_id: '1',
    event: 'event_2',
    properties: {
      '$groups': undefined,
      '$lib': 'posthog-node',
      '$lib_version': '4.0.1',
      '$geoip_disable': true
    },
    type: 'capture',
    library: 'posthog-node',
    library_version: '4.0.1',
    timestamp: '2024-08-12T19:54:09.377Z',
    uuid: '01914827-bc61-7f44-ae9d-612c1157e518'
  }
]

This also can be reproduced by capturing after awaiting a promise instead of a setTimeout.

async function main() {
  posthog.capture({ distinctId: '1', event: 'event_1' })
  await fetch('https://example.com')
  posthog.capture({ distinctId: '1', event: 'event_2' })
}

main()

Related sub-libraries

[ ] All of them
[ ] posthog-web
[x] posthog-node
[ ] posthog-react-native

Additional context

I believe a simple fix is to check if the queue > flushAt when the flushPromise resolves, and if so, flush again.

marandaneto commented 1 month ago

However, if another event is captured before that flush completes, that event is stuck in queue until the next flush happens, which would typically be on the next capture or shutdown if called

that is by design, so we batch things instead of sending serial requests.

These events would be lost when I would have expected these to get sent, especially if flushInterval is 0. (Or, in my case, staring at the posthog dashboard wondering why my events weren't showing up during development)

true, but this usually only happens during development because of process crashing, hot reloading, etc, this is likely not happening in production unless your service is constantly crashing.

@mattjennings could you for example call await posthog.flush() or await posthog.shutdown() before the process crashes? (using a signal handler if possible).

I believe a simple fix is to check if the queue > flushAt when the flushPromise resolves, and if so, flush again.

This would just make the batching option useless since it's always sending what's left before the time interval or the min. amount for batching losing all the improvements for less network requests, etc.

I' say this is not a bug, but a feature, but I do understand the bad UX during development, I wonder if you can come up with more ideas, I think the signal handler would be a good one, a last chance to flush everything pending before crashing.

Something like https://nodejs.org/api/process.html#event-beforeexit https://nodejs.org/api/process.html#event-uncaughtexceptionmonitor https://nodejs.org/api/process.html#event-uncaughtexception Not sure which one is more suitable

mattjennings commented 1 month ago

My workaround was to set up a setInterval(() => posthog.flush(), ms) to guarantee flushes regardless of when an event was captured.

I think there are two concerns here - one is expected behaviour when flushInterval is 0, and the other is when the flushInterval triggers a flush.

In the case of flushInterval being 0, I assumed it would send each event immediately (once flushAt was met) as per this note

Note: When using PostHog in an AWS Lambda function or a similar serverless function environment, make sure you set flushAt to 1 and flushInterval to 0. Also, remember to always call await posthog.shutdown() at the end to flush and send all pending events.

So maybe this could do for a bit more clarification on what flushInterval of 0 means, and/or that flushing runs after a capture and not as a recurring interval. I had an understanding that flushAt was evaluated after each capture, and flushInterval was a recurring interval to clean up whatever was queued.

Alternatively, if there were an option to disable batching altogether this is probably what someone would want for lambda/dev environments.

As for the crashing concern, if you're able to run it in beforeExit then you can await posthog.shutdown(), but nodejs will not await exit handlers, so if posthog.shutdown() needs to await for anything before flushing it'll get lost. But this is admittedly an edge case, and only something I would care about if I had an unflushed queue for > flushInterval time that could have been sent before the exit happened.

PostHog / posthog-js-lite