andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Speed up code review parsing with big JSON files #178

Closed andymeneely closed 9 years ago

andymeneely commented 9 years ago

Currently, we have ~150k files and ~150k directories of JSON files for our code reviews. Parsing takes about 2-3 hours currently of these code reviews into CSVs that then get dumped into the database. Unfortunately, file IO is a huge part of this. For the bug data, we discovered that parsing goes very fast if we load the data 10mb at a time and parse them that way. So, here's the big refactoring:

Keep the verifies running - we don't need any new verifies for this task.

This is also a big task, so I'll make sure the development is done in a separate branch so it doesn't break the daily build. I would also like to take this on.