DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.27k stars 556 forks source link

Cleanup examples Notebooks #134

Closed bbengfort closed 7 years ago

bbengfort commented 7 years ago

@rebeccabilbro - so part of the user study (which I've just started) is to create a notebook in examples/username so that we can have a bunch of example pull requests. As I was looking in this directory, perhaps it is time to clean up a bit? This might also be a strategy to solve our notebook merge conflicts!

So the notebooks/dirs that I think are up for clean up are:

This will leave the primary notebooks:

As the top level examples. I'm not sure about text.ipynb; either we can merge it into examples or leave it as a top level example, it's up to you.

Then we can have directories for each user with GitHub usernames, and that's also where we can put our prototype work to avoid merge conflicts.

What do you think?

bbengfort commented 7 years ago

And too that end, we should also put a README.md in that directory that explains what download.py is, what the examples are, and how to contribute your own examples.

rebeccabilbro commented 7 years ago

My comments inline:

As I was looking in this directory, perhaps it is time to clean up a bit? This might also be a strategy to solve our notebook merge conflicts! Agree

As the top level examples. I'm not sure about text.ipynb; either we can merge it into examples or leave it as a top level example, it's up to you. Let's integrate into examples.ipynb

Then we can have directories for each user with GitHub usernames, and that's also where we can put our prototype work to avoid merge conflicts. What do you think? Agree

And to that end, we should also put a README.md in that directory that explains what download.py is, what the examples are, and how to contribute your own examples. Agree

rebeccabilbro commented 7 years ago

TODOs:

rebeccabilbro commented 7 years ago

@bbengfort just confirming that for the user studies, we should create a directory within the yellowbrick/examples directory that is named with our github username, e.g.

examples ├── download.py ├── examples.ipynb └── rebeccabilbro | ├── userstudy.ipynb | └── data.txt └── bbengfort | ├── userstudy.ipynb | └── data.txt └── ndanielsen | ├── userstudy.ipynb | └── data.txt

Is that kind of what you were thinking?

bbengfort commented 7 years ago

@rebeccabilbro yep, that's exactly what I was thinking.

Most of the changes I've made to this directory are in my pull request. I'll get my user study done and complete the PR so those changes take effect. I can also deal with TODOs 1 and 2.

NealHumphrey commented 7 years ago

@bbengfort and @rebeccabilbro I know I'm late to the party on this one (powder day in Tahoe...), but one question I have on this is how do these notebooks connect/relate to the examples in /docs/examples? It looks like examples/examples.ipynb is the same/similar to /docs/examples/examples.rst. How do you get from one to the other and when? It still seems like the folder structure with the notebooks makes sense re: user studies, and the /examples folder can be a staging area for more polished examples.

Second question, @rebeccabilbro your mock folder structure includes a bunch of data.txt. Will you want people to include their data files in their user study commits? I would tend towards asking people to gitignore these and describe how to get the data in their example code; the data files can add bloat to the repository, both for cloning purposes and eventually b/c there is a limit on repo size; especially true if most of the examples don't end up as presentation-ready versions like those in the root. On the flip side it's nice not to depend on external maintenance of data sources, and if most come from places like UCI Machine Learning repo the data is relatively small.

rebeccabilbro commented 7 years ago

@NealHumphrey the docs will lag behind the examples because the we are manually generating that part of the .rst documentation from the .pynb examples using nbconvert:

jupyter nbconvert --to FORMAT notebook.ipynb

re: including the data together with the user study commits, I think there are 2 reasons why we'd want this.

That said, I actually did .gitignore my dataset when I committed my user study because it's huge and available on UCI; instead I simply included instructions for how to get it.

bbengfort commented 7 years ago

@NealHumphrey - just to add on to what @rebeccabilbro said; the examples notebook is basically an executable version of the examples tutorial in the documentation - but the documentation does lag behind as examples.ipynb can keep up with development.

Please also see examples/README.md for more information on how we're planning to structure this gallery.