Open saleiro opened 5 years ago
Not that this is set in stone, but experiments are designed to answer some question. Isn't "do complex models perform better than simple models" a question worth answering, and thus dependent on the grid?
What do you see an "experiment" as being? I'm not sure what an 'experiment' is if you remove the grid from it. A set of matrices? I'm not totally against changing anything but I truly don't understand what you expect the experiment to represent with this change.
I had essentially the same thought as @thcrock on this. I can see several versions of this:
I think of an experiment as the configuration of the system at run time [with the exception of the data state, for most projects]. We will very often run several sets of similar experiments, varying one or two elements in each because the exponential increase in computation time of running all combinations of all config elements is unjustified.
Currently, we can use the comment fields to create user-defined groups of experiments. We can also use (a little awkwardly) the experiment_matrices table to identify experiments that have the exact same set of matrices (and therefore only differ in their models and perhaps scoring settings). And there is the whole config in json, which can be pulled out and used to find experiments that agree on some set of config elements.
But perhaps a more robust way of doing this is to associate each experiment with hashes (assigned unique numerical ids... #668) of the configurations of each of it components as we do with the cohort, label, and subset components. So we have a time configs table with all of the time configs we've used, a learner grids table with all of the learner grids, etc. And then the experiments table has a column for each component that stores the hash or #. You could then easily and dynamically find experiments that match/differ on any number of the components, depending on your needs. This may not be justified over just using the json, though.
Some of those examples are just begging to implement iteration where currently there is none. The second example, iterating over label definitions, is even already an issue ( https://github.com/dssg/triage/issues/445 ), which was even brought up recently. In general I think we want to do this. One of the complaints about some old systems when Triage was being started was that whenever a team came up with a question they wanted to answer they had to run several config files. Doing cross-producting iteration all of the time isn't always the best way to do experiments, as @ecsalomon noted, but Triage should support this when it is reasonable (like, say, multiple label definitions).
I also totally am in support of Erika's last paragraph suggestion of adding more ids (and metadata tables to hold them) for smaller components to allow closer inspection without trying to redefine the experiment hash away from the current simple "the full experiment config, hashed" definition.
We often want to change the grid config for the same "experiment" but triage assigns it a new experiment hash. This is counter-intuitive. The grid gets encoded in the model group "model_type" and "hyperparameters" so there is no need to encode it in a experiment hash.