dssg / deploybot

A series of Chef Recipes to deploy the DSSG stack.
MIT License
1 stars 4 forks source link

Use Drake for data workflows #1

Open jpvelez opened 10 years ago

jpvelez commented 10 years ago

@hunterowens and @matthewgee : I've recently started using Drake to organize data projects, and it's amazing.

You specify and input and output data file, and then some series of shell commands that get you from here to there. It handles dependencies, so if some day in your pipeline has changed, it recomputes everything downstream.

Not only does it help with development, deployment (not to production, but running things locally), and getting collaborators up and running quickly, but it's the best form of project-level documentation I've run into. "Here's how all these files and scripts fit together into a coherence workflow or set of analyses." You guys should get everyon to use it, to save us the headaches we had at the end of last summer where we couldn't make sense of all the stuff in the repos.

jpvelez commented 10 years ago

For any kind of production data workflow - get data from here, munge it, train a model, run a hadoop job - use luigi: https://github.com/spotify/luigi