David-Byrne / Hangons

Web app to parse and save your Hangouts.json file into a more friendly format.
https://www.davidbyrne.io/Hangons/
MIT License
49 stars 10 forks source link

500mb hangouts.json #1

Closed faddat closed 4 years ago

faddat commented 7 years ago

Crashes the webapp

David-Byrne commented 7 years ago

Thanks for the feedback 😃

The problem is most likely the size of your file, my Hangouts.json file is 'only' 56MB and Hangons still (mostly) works with that size.

Do you remember if the browser crashed when it was still reading in the file (first loading bar would've been greyed out) or if it was when it was manipulating the data (you'd have chosen what file format you wanted at this stage, second loading bar would've been greyed out)?

Also, can you tell me your:

mxxcon commented 7 years ago

I have a 700MB file and in Chrome 60 the app crashes about 2 seconds after I pick the file.

In Firefox 54.0.1 after selecting my file it sits for about 30 seconds chewing cpu and then I get

The chosen file cannot be parsed The Hangouts.json file contains data which cannot be read. If you could copy the error code: err.message and report it here I will try fix it.

Firefox's console showed this

InternalError: allocation size overflow Stack trace: readFile/reader.onload@http://www.davidbyrne.io/Hangons/hangons.js:31:17

In MS Edge 40 after selecting my file it sits for about 5sec then I get a popup that the page is not responding and then Edge kills and reloads that tab.

David-Byrne commented 7 years ago

It's a good thing Firefox displayed the error and didn't just crash like the others, it gives us something to work with. Line 31 is where I parse the json data from Hangouts.json into an object, so it looks like the tab just runs out of memory and crashes. There isn't much I can do really as it's a browser issue and this isn't the typical use case they were designed for. Thanks for the detailed report, it's very much appreciated and I'm sorry Hangons isn't working for your backup.

mxxcon commented 7 years ago

I don't know JavaScript, but could this be converted into a cli tool to run outside of a browser's restrictions? Or somehow change the process so that it doesn't try to work with the whole file at once but kinda stream it one conversation_id at a time or however else it's structured? While looking around I also found this https://stackoverflow.com/questions/20690203/dealing-with-a-json-object-too-big-to-fit-into-memory. Would that be something useful in this case?

David-Byrne commented 7 years ago

It could probably be converted to run as a Node.js script without changing too much of the core logic, you're still dealing with the V8 JS engine though so not sure if that would yield much improvement.

The problem with JSON is you have to read it in as one big chunk, as opposed to CSV or similar which can be read line by line. The memory issue happens before any of my logic is run so there's nothing I can do there to split it up.

The module mentioned in that thread looks interesting enough, seems to be Node.js specific though so not something I can leverage for the time being.