amueller / scipy-2016-sklearn

Scikit-learn tutorial at SciPy2016
Creative Commons Zero v1.0 Universal
515 stars 516 forks source link

No SMS data #63

Closed rasbt closed 8 years ago

rasbt commented 8 years ago

I also can't find the SMS dataset in the repo or fetch_data.py :(

amueller commented 8 years ago

at some point I asked "are you sure the fetch_data.py fetches the right datasets" ;)

amueller commented 8 years ago

https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection Should I work on that?

rasbt commented 8 years ago

Sorry, I think that was before we used SMS ... I am not sure ...

but I am pretty sure that I was able to run all notebooks at some point. Maybe it was there as CSV, and I remember that we added the dataset directory to .gitignore (I typically use a different machine when I am at home so I think I still had it there as a leftover, sry :()

rasbt commented 8 years ago

yeah the dataset above looks good

amueller commented 8 years ago

hm... but adding it to the gitignore shouldn't change it if it was already checked in? hm. maybe it was there..

amueller commented 8 years ago

yeah both titanic and smsspam were in the dataset folder

rasbt commented 8 years ago

I am curious what happened to them (and when), hmmm...

amueller commented 8 years ago

It's 466k, I'll add it.

amueller commented 8 years ago

in the old repo the datasets was already in gitignore. if I moved everything over, and not force added it, they are gone. Should have added a new remote and pushed to it. or maybe I did that? I don't remember. But I'll fix it now.

rasbt commented 8 years ago

let me check my backups

rasbt commented 8 years ago

Just found them ... :) Shall I add titanic or are you already working on that?

screen shot 2016-07-11 at 7 58 08 pm

amueller commented 8 years ago

I added both in PRs