Closed dcrankshaw closed 9 years ago
If it's decently sized (> a couple of megabytes), I would recommend against including data/news-classifier-from-tomer in the git repo if you can.
If that's not an option, we'll need to make a note to rewrite history and remove the file from the git history once we no longer need it around.
This PR has a few components. I rewrote all of the cluster management scripts to use fabric, nice Python library that makes running shell commands (especially remotely) really easy. I also added a bunch of functionality to set up and run benchmarks and then store the results in MongoDB. This has made benchmarking pretty painless.
The second component adds the newgroups pipeline written by @tomerk as another model for Velox.
This is sort of a monster PR, but a lot of it is config files and management scripts. There's not much change to the Velox codebase.