Open dmarx opened 7 years ago
New tables:
DATASETS
FIELDS
FIELD_STATS
Stats we want for features/targets:
Stats we want for modeling coefficients:
The framework above can probably be generalized sufficiently to supplant the RESULTS framework I've already got in the schema.
Maybe I don't need to be so insanely generalized. We can have a couple of results tables, and we can also have separate tables for tracking stats on the data and stats on coefs. I feel like model coefs lends itself well to combining with the data stats. Modeling results should be fine, but the text thing... I dunno. Also I need to figure out how best to add tasks to this schema, since we have tasks for base tables, models, scores, and evals, but not upstream tables.
Let's break this up into a few separate tasks.
ABTs are fairly well formed already, so I think that's a good place to start. Logging model coefficients is probably non-trivial and might not be something we want to automate (e.g. we wouldn't want to log coefficients in a deep NN).
Yeah, logging data on ABTs I think makes sense. More generally, automating data profiling on ABTs probably isn't a terrible idea either.
Some reasons this might be valuable:
Should probably rename database. It's called "modeling_results" right now. Maybe just call it "project_db" or something like that.