Open gilliek opened 9 years ago
Mmh, what is preventing us to store the intermediate JSON in a compressed format and then un-compress it and stream it to srcanlzr
directly? Like why does it need to be handled by srcanlzr
?
Nothing is preventing us to do so. The main advantage of doing it directly in Go is performance IMHO. The bzip2 package of the Go standard library (http://golang.org/pkg/compress/bzip2/) implements the reader interface and the JSON decoder can directly read JSON from a reader. That way, the JSON decoder can uncompress and decode the JSON at the same time.
Besides, it only takes few lines of code to implement that option. Since everything comes from the standard library, it does not require extra testing. So I see no reason not to implement it :)
Fair enough.
It'll be interesting to micro-benchmark using something like bzcat foo.json.bz2 | srcanlzr ...
vs having srcanlzr
handle it all through bzip2
from the standard library using the reader interface. Just out of curiosity. :)
Yeah for sure :)
I bet that the pure Go version will be faster. Even if the Go standard implementation is much slower than bzcat(1), in the end, the bzcat solution will need to read the bzipped file, output the uncompressed JSON and srcanlzr
will have to read it, instead of just reading the bzipped file once :)
Since we are dealing with a huge amount of data, it is very slow to re-parse all the projects with the source code parsers everytime we update the source analyzer. Thus, it makes sense to store the intermediate JSON. However, the JSON files are really big and they use a lot of disk space so it would be useful to compress them.