Closed amueller closed 8 years ago
Oh, just commented on #15 regarding the dataset. I think it may be better to stick with the small dataset in the intro notebook 14 and use the big IMDb one for #28 (out of core).
I've a parsed CSV of the dataset here at: https://github.com/rasbt/python-machine-learning-book/tree/master/code/datasets/movie
Not sure if it wouldn't be better to read it from there via the fetch_data.py script since the original is basically a hierarchical directory structure of 50,000 files which may take a while (too long) to parse?
there is the load_files
function which does it. It takes a couple of seconds, but not too bad, I think.
okay nice! Will use this one then!
I think we can close this. We are using the SMS spam dataset for the text-classification intro, and the IMDb for out-of-core learning (as per #35)
14 Application: IMDB Movie Review Sentiment Analysis