Closed dchiu911 closed 9 years ago
That's a great dataset!
One that I found interesting when I took this course was the Global Terrorism Database, which has tons of information about over 100,000 terrorism acts worldwide since 1970.
I hope more students share cool datasets they find, good idea starting this issue
Here is some excellent (and already reasonably tidy) NHL player stats data. There is some fun to be had combining the data for each player, which is currently in several excel workbooks.
I think a good data set to look into is the airline on-time performance data available by years here. Considering that, for example, the data for 2007 is over 100MB, It would help illustrate the problems that arise when exploring larger data sets.
An interesting dataset can be found in the MovieLens website. The dataset collects movie preference and ratings. Three dataset of different size are available (100K, 1M, 10M ratings). There will be some merging and cleaning to do since the data are saved in different documents.
Here is a dataset on the current Ebola outbreak.
Here's a massive list of datasets. I've got others like this up my sleeve that I need to produce here.
A bitty bundle of research quality datasets by Hilary Mason
100+ Interesting Data Sets for Statistics
The home of the U.S. Government’s open data
1001 DATASETS AND DATA REPOSITORIES ( LIST OF LISTS OF LISTS )
this blog announces obsessively-detailed instructions to analyze publicly-available survey data with free tools - the r language, the survey package, and (for big data) sqlsurvey + monetdb.
See the "data repositories" section of the visualization design resources curated by the InfoVis Group in UBC Computer Science
Some public data sources gathered by the Data Incubator:
http://blog.thedataincubator.com/2014/10/data-sources-for-cool-data-science-projects-part-1/
You can get data on the Ebola outbreak from DataMarket:
https://blog.datamarket.com/2014/10/15/ebola-data-on-datamarket/
Maybe we can share data sets here that interest us, and see how much they interest others as well! Also it would be good to comment on whether the data size is too big or too small.
I'm kind of a basketball fanatic so I'd love to analyze certain statistics. Of course, this is only one man's opinion. The 2013-2014 NBA season totals of some basic stats can be seen here as an example.