adein / hangouts_to_sms

Google Hangouts SMS/MMS to XML converter
MIT License
31 stars 11 forks source link

Large JSON files cause a memory error #1

Open adein opened 7 years ago

adein commented 7 years ago

Large Hangouts JSON files cause Python to crash with a memory error while parsing the JSON file. This is because Python tries to parse the whole file at once.

Short-term solution: Edit the file in a text editor that supports large files and remove any Hangouts/non-SMS conversations.

Long-term solution: Change the JSON parsing to use a 3rd party stream-based parser instead of the native Python library.

Error trace:

Traceback (most recent call last):
File "hangouts_to_sms.py", line 15, in <module>
conversations, self_gaia_id = hangouts_parser.parse_input_file(HANGOUTS_JSON_FILE, YOUR_PHONE_NUMBER)
File "..\hangouts_to_sms-master\hangouts_parser.py", line 23, in parse_input_file
data = json.load(data_file, object_hook=lambda d: Namespace(**d))
File "..\AppData\Local\Programs\Python\Python36-32\lib\json_init_.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "..\AppData\Local\Programs\Python\Python36-32\lib\json_init_.py", line 349, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
MemoryError
catskul commented 7 years ago

I have a branch that uses ijson which might be a candidate for pull request.

I have to verify that it spits out the same output as the current version before I submit it.

Will try tonight.

gbrown2036 commented 6 years ago

Thanks, catskul. Keep us posted.