dreamyguy / gitlogg

💾 🧮 🤯 Parse the 'git log' of multiple repos to 'JSON'
MIT License
130 stars 27 forks source link

Parallelize creation of gitlogg.tmp #8

Closed Inventitech closed 7 years ago

Inventitech commented 7 years ago

This starts the processes to generate gitlogg.tmp in parallel and joins them upon finishing. Since the pipeplined commands do not run in parallel and can cause a very hard CPU load, this change drastically speeds up the analysis of multiple large repositories at the same time for multi-core systems.

dreamyguy commented 7 years ago

Hi @Inventitech, I just pulled your changes locally, resolved the conflicts and tested them on two sets of repositories:

It did work fine for the first set, but I did not notice any significant boost on script completion time. Through the new console output, that shows the repo currently being output to gitlogg.tmp, I could see that all 8 repos were queued a once. The total output was done within the average completion time, neither significantly faster nor slower.

it didn't go so well for the second set. The console showed that 106 out of the 470 were queued to be processed and then it stopped responding. I registered a great spike on my CPUs, every time I tried to run gitlogg-generate-log.sh with your changes (numbers 1, 2 & 3). I stopped the process manually:

gitlogg-pull-request

The master range on the graph is when I ran gitlogg-generate-log.sh again without your changes.

This is too bad, I really wanted it to work as I'd appreciate any speed boost. I'm closing this for now, for tidiness' sake. Make some more tests, with a good number of repos, and I'll look into it again.

BTW remember to pull the latest changes, I think you'll like them. 😃