fix: large old replay data splitting

we see a non-zero amount of MessageTooLarge errors

these are all (so far on sampling inspection of the data) from old clients that didn't have batching code and would sometimes send a lot of data in one go

around 1 in 5 of them have many hundreds of items to process 1 in 25 has tens of thousands of items

we already have code that should be splitting these out into individual events

but clearly it's not working

and we don't really want one API call to generate 10k kafka messages 🙈

so this PR

changes how we check the headroom - we're clearly under counting, this might help

i looked at how the data is going to be sent to kafka and tried to copy that so that we're counting bytes, and counting a similar bytes array, instead of counting characters, JS in the browser uses UTF-16 string and kafka/python is using UTF-8 so maybe there's some silliness happening here

split the list instead of exploding it

the final case in the processing if the non-full snapshots won't fit into headroom sends every item from the list individually

instead now, we keep splitting the list into 2 and checking the size of each half in theory this means the majority case is we'll split into one or two messages each with many events

PostHog / posthog

fix: large old replay data splitting #23454

changes how we check the headroom - we're clearly under counting, this might help

split the list instead of exploding it