Closed guillaume-chech closed 2 years ago
💡 Investigating further, I could narrow down the issue to being an issue with how Export API Response is parsed using response.text.splitlines()
here
It seems that some lines are "cut" randomly in between.
To isolate the problem I ran the connector locally and added some good old dirty prints to identify way the json parsing was failing, printing the index and the record_line
.
I compared the printed record_line with manual exported json file using curled http request for the very same day of data.
I identified that in case line some text properties included a line break as follow for example, splitlines()
would, rightfully, split the json item into 2 lines, which is unintended IMO.
Here the example, see the Message
properties that contains a line break after the word sous-sols
{
"event":"Tap Newsfeed Item",
"properties":{
"time":1647880119,
"distinct_id":"1797f72ffd5edc-05690f19e13ab08-1b05575d-4a574-1797f72ffd6134f",
"$app_build_number":"309",
"$app_version_string":"9.3.4",
"$carrier":"",
"$city":"Lyon",
"$device_id":"16733740-E75C-419A-8AEC-38E21AB12D1D",
"$distinct_id_before_identity":"583325",
"$had_persisted_distinct_id":false,
"$insert_id":"bd7d438843772e57",
"$lib_version":"1.3.4",
"$manufacturer":"Apple",
"$model":"iPhone12,3",
"$mp_api_endpoint":"api.mixpanel.com",
"$mp_api_timestamp_ms":1647880119550,
"$os":"iOS",
"$os_version":"15.3.1",
"$radio":"LTE",
"$region":"Rhône",
"$screen_height":812,
"$screen_width":375,
"$user_id":"583325",
"$wifi":false,
"App Installed":true,
"App Version":"9.3.4",
"Call To Action":"Accéder à l'évènement",
"Device ID":"F50E0003-3465-4BE7-8B86-B82F13661000",
"Message":"~Aphrodizia est de retour pour la PFW 🔥~\n... et investira les immenses sous-sols
du Palais de Tokyo avec Sébastien Léger, qui présentera son modular live show 💯 Rendez-vous demain soir !",
"Notification Type":"Sélectionné pour toi",
"Platform OS":"ios",
"Platform Version":"15.3.1",
"User Id":"583325",
"event":"Tap Newsfeed Item",
"mp_country_code":"FR",
"mp_lib":"react-native",
"mp_processing_time_ms":1647880119593
}
}
💡 Using Response.iter_lines()
native method seems to solve the issue.
@marcosmarxm Sorry for the ping. If you think it's not relevant pls point me to the relevant issue fixing procedure.
I'd like to open a pull request to solve the issue but I fear my Software Engineering skill set is a bit weak to do that without proper guidance. Any checklist I need to do to propose this solution ?
Hi @guillaume-chech :) Thank you very much for going into the debugging yourself. If you found the bug you definitely have the skills to fix it 👍 . The procedure is quite simple:
Hi 👋 @alafanechere Thanks for the motivational response. Here it is : I hope it follows what you expected , I could not run the acceptance test unfortunately and I'm not sure how to perform every item of the PR check list. Happy review !
Closing this as it looks like PR was merged.
Environment
Current Behavior
When retrieving
export
objects from Mixpanel, if the response from Mixpanel contains malformed Json the whole sync job crashes. This is a problem because Mixpanel allow only to retrieve data for a full day granularity. So it's impossible to skip a fraction (1 minute or 2 of data where the data is faulty) the only issue is to skip a whole day, which is not really possible in terme of data loss.Expected Behavior
We would expect the connector to first retry, and offer the possibility to ignore malformed events.
Logs
Steps to Reproduce
Unsure how to reproduce this as I could not find in the logs which events are faulty or not
Are you willing to submit a PR?
No, I'm not capable of such a thing .
Related issue : https://github.com/airbytehq/airbyte/issues/11008