DataHaskell / docs

:books: - Documentation site
http://datahaskell.org/docs
38 stars 14 forks source link

Define a "standard" process to follow when solving simple data science problems #26

Open NickSeagull opened 6 years ago

Drezil commented 6 years ago

Some short things this should talk about:

ocramz commented 6 years ago

@ixxie wants to write about reproducibility with Jupyter and Nix, I've added to the DH members'list, he should see this soon as well

ixxie commented 6 years ago

Hmmm, I am not sure if this is quite relevant to this; my goal is more to try and create easily reproducible infrastructure as code, i.e. to allow anybody to deploy a data science platform relatively easily. Reproducibility of individual computations is also of great interest and Nix can help with this, but I don't know much about this atm (would be willing to look into it some time!).

FWIW, it seems a bit far fetched to be able to specify a simple decision tree recipe for doing data science; the way I would approach this is to think of it like a bipartite graph: list some problems (e.g. tokenization, classification, clustering, etc) and some algorithms (CRFs, RNNs, HDBSCAN) and link between them.