cooncesean / mixpanel-query-py

The Python interface to fetch data from Mixpanel.
MIT License
29 stars 17 forks source link

Raw Export Bug Fix #22

Closed thesmallestduck closed 8 years ago

thesmallestduck commented 8 years ago

why

It seems that mixpanel's raw export behavior changed, and we need to be more rigorous when fetching raw events. Mixpanel's raw event file format is a json document per line. It used to be the case that you could reach out to mixpanel and they would send data across the wire json document by json document. This allowed us to lazily interpret each chunk as we were pulling data down as a document. Unfortunately, mixpanel has changed it's behavior recently and now chunks documents independent of json document boundaries across the wire. Their recommendation is to pull the entire response before attempting to interpret the data: https://mixpanel.com/docs/api-documentation/exporting-raw-data-you-inserted-into-mixpanel

what

This PR modifies the get_export endpoint so that rather than attempting to load each of the response chunks as a json document, it pulls the entire response content down, splits on the new line, and interprets those lines as json one at a time.

thesmallestduck commented 8 years ago

hold on this for a second, I think I need to use the _to_text util function before I do the split

thesmallestduck commented 8 years ago

okay, can confirm this works for me locally

cooncesean commented 8 years ago

@thesmallestduck I do not have bandwidth to pull this down and test out. I trust you to merge this when its working for you as expected and won't break any other users if they happen to pull down this version.

Do we need to update documentation around this? Or make sure users apply this library version to a specific version of Mixpanel's API?

thesmallestduck commented 8 years ago

I will test with python3 before merging (it is working with 2.7).

Mixpanel has not changed it's version number on it's API, but they did update the docs to include this recommendation to fetch the entire response as long ago as September of 2015 (according to the way back machine). This PR is currently required for this functionality to work for any seemingly large data exports. As currently implemented in master without this PR, this lib's bulk export does not play well with mixpanel's production API.

No doc changes on our part should be necessary.

thesmallestduck commented 8 years ago

works with python 3.5.1