Open NickSeagull opened 6 years ago
@ixxie wants to write about reproducibility with Jupyter and Nix, I've added to the DH members'list, he should see this soon as well
Hmmm, I am not sure if this is quite relevant to this; my goal is more to try and create easily reproducible infrastructure as code, i.e. to allow anybody to deploy a data science platform relatively easily. Reproducibility of individual computations is also of great interest and Nix can help with this, but I don't know much about this atm (would be willing to look into it some time!).
FWIW, it seems a bit far fetched to be able to specify a simple decision tree recipe for doing data science; the way I would approach this is to think of it like a bipartite graph: list some problems (e.g. tokenization, classification, clustering, etc) and some algorithms (CRFs, RNNs, HDBSCAN) and link between them.
Some short things this should talk about:
Data preparation
Algorithm selection
Ways to present/interpret results