While writing some boilerplate to allow c4f-stat to output statistics, it occurred to me that I'm hardcoding a lot of specific analyses (such as 'give me all mutations that were hit, but not killed'), and also that it is almost always the case that the stats persister doesn't have the stats I need for a given paper.
While it's not going to be possible for me to do so any time soon, I wonder if it's worth replacing the stat persister (which is constantly storing specific views on analyses) with a SQLite database that just logs the analyses in full every time it observes them, then offers the stats in the form of SQL queries. This would have several advantages:
analyses get grouped together in one place;
we can do complex aggregation over analyses (such as 'give me the minimum, maximum, and other counts of observations - something I needed to do for a paper but had to do by manual filesystem walking);
as we add new stats, we can reuse old data, instead of needing to rerun experiments;
we can refactor the stats persister to instead perform queries over the analysis database, while retaining the same outward API onto it.
A disadvantage of this is the massive dependency it would insert. SQLite is a cgo dependency, usually. Perhaps we could use other SQL or NoSQL databases, but I really don't want to make c4t dependent on having a database set up.
Even the analysis stage does a lot of shedding information, eg aggregating compilations down to min/mean/max duration slots. A database with its own aggregation setup makes sense here, I think.
While writing some boilerplate to allow
c4f-stat
to output statistics, it occurred to me that I'm hardcoding a lot of specific analyses (such as 'give me all mutations that were hit, but not killed'), and also that it is almost always the case that the stats persister doesn't have the stats I need for a given paper.While it's not going to be possible for me to do so any time soon, I wonder if it's worth replacing the stat persister (which is constantly storing specific views on analyses) with a SQLite database that just logs the analyses in full every time it observes them, then offers the stats in the form of SQL queries. This would have several advantages:
A disadvantage of this is the massive dependency it would insert. SQLite is a
cgo
dependency, usually. Perhaps we could use other SQL or NoSQL databases, but I really don't want to makec4t
dependent on having a database set up.