Closed thesmallestduck closed 8 years ago
hold on this for a second, I think I need to use the _to_text
util function before I do the split
okay, can confirm this works for me locally
@thesmallestduck I do not have bandwidth to pull this down and test out. I trust you to merge this when its working for you as expected and won't break any other users if they happen to pull down this version.
Do we need to update documentation around this? Or make sure users apply this library version to a specific version of Mixpanel's API?
I will test with python3 before merging (it is working with 2.7).
Mixpanel has not changed it's version number on it's API, but they did update the docs to include this recommendation to fetch the entire response as long ago as September of 2015 (according to the way back machine). This PR is currently required for this functionality to work for any seemingly large data exports. As currently implemented in master without this PR, this lib's bulk export does not play well with mixpanel's production API.
No doc changes on our part should be necessary.
works with python 3.5.1
why
It seems that mixpanel's raw export behavior changed, and we need to be more rigorous when fetching raw events. Mixpanel's raw event file format is a json document per line. It used to be the case that you could reach out to mixpanel and they would send data across the wire json document by json document. This allowed us to lazily interpret each chunk as we were pulling data down as a document. Unfortunately, mixpanel has changed it's behavior recently and now chunks documents independent of json document boundaries across the wire. Their recommendation is to pull the entire response before attempting to interpret the data: https://mixpanel.com/docs/api-documentation/exporting-raw-data-you-inserted-into-mixpanel
what
This PR modifies the get_export endpoint so that rather than attempting to load each of the response chunks as a json document, it pulls the entire response content down, splits on the new line, and interprets those lines as json one at a time.