bholt / experiments-new

Library for running and tabulating experiments and parameter sweeps.
2 stars 1 forks source link

Database-based experiments #10

Open bholt opened 11 years ago

bholt commented 11 years ago

More reliable way to architect persistent experiments, etc, would be to insert the experiments to be run, then run the batch script just with the database and record ID. Then when results are in, it can just fill in the existing record rather than racing on insertion.

There are several problems with this:

So not ready for implementation, but just brainstorming ideas.

bholt commented 11 years ago

Could use Sequel::Model instances to represent experiment records, then maybe can use the various hooks they provide to instrument each part of the process of running an experiment.

bmyerz commented 11 years ago

I think the second potentially compelling reason to do this is that a query language can be a more powerful way to _specify_ which experiments to run. {full, partial} cross products, specifying specific {full,partial} experiment assignments, and using expressions of other parameters can all be expressed in a relational model.

Some simple examples in datalog..

basic cross product

p_nNode(2) p_nNode(3) p_ppn(4) p_ppn(5) exper(x, y) :- p_nNode(x), p_ppn(y) Ans(x, y) :- exper(x, y)

expression

p_nNode(12) p_ppn(3) p_ppn(4) p_nProc(x) :- p_nNode(y), p_ppn(z), x==y*z exper(x,y,z) :- p_nNode(x), p_ppn(y), p_nProc(z) Ans(x, y, z) :- exper(x, y, z)

specify (tie) 2 of 3

p_method('steal') p_method('none') exper(12,2,z) :- p_method(z) exper(3,4,z) :- p_method(z) Ans(x, y, z) :- exper(x, y, z)

Of course, having a concise way to specify these (as in our ruby DSL) is useful. So far I've thought that our current DSL could be enhanced to support a union of params hashes to support some of this (the OR of the horn clauses)

bholt commented 11 years ago

I think we'll have to sit down together and chat about this for me to understand what you're talking about. I agree being able to express a query that you want to run makes just as much sense as querying to see what was run. It's not clear to me that this datalog thing you wrote down is easier than what we have now. And the "union" of params may be possible now given a tiny amount of work.