dreamyguy / gitlogg

💾 🧮 🤯 Parse the 'git log' of multiple repos to 'JSON'
MIT License
130 stars 27 forks source link

Parse JSON through a read/write stream, so we get around the 268MB string size limitation #11

Closed dreamyguy closed 7 years ago

dreamyguy commented 7 years ago

...because that's the limit for strings in V8, a setting that's inherited by NodeJS (the runtime environment interprets JavaScript using Google's V8 JavaScript engine).

Here is an issue about the fuzzy error message at NodeJS's runtime, but however nice a more meaningful message would be, the V8 limitation would still be there.

This is pretty lame as the whole point of Gitlogg is to parse git log from multiple repositories to JSON, however bigger they are.

Does anyone know a way to bypass that limitation, or parse the information on gitlogg.tmp more effectively, through a smarter stream?

I came across this problem while attempting to parse the git logfor https://github.com/LibreOffice/core. I read about the error and went on deleting a bunch of lines until I got it to work. The 268 MB file-size limitation was confirmed...

308,374 lines - failed    478,8 MB
174.000 lines - failed    268,8 MB
173.500 lines - worked    267,9 MB
dreamyguy commented 7 years ago

Hm.. what was I thinking... The problem here is not the buffer limitation, but that the data has to be streamed in chunks. That's one of the things NodeJS is known for...

I'll be taking a swing at it, but help would still be appreciated, as we'd have to stream more than just a string, and I haven't done that before. Learning FTW!