STAT545-UBC / Discussion

Public discussion
38 stars 20 forks source link

Interesting Datasets #39

Closed dchiu911 closed 9 years ago

dchiu911 commented 9 years ago

Maybe we can share data sets here that interest us, and see how much they interest others as well! Also it would be good to comment on whether the data size is too big or too small.

I'm kind of a basketball fanatic so I'd love to analyze certain statistics. Of course, this is only one man's opinion. The 2013-2014 NBA season totals of some basic stats can be seen here as an example.

daattali commented 9 years ago

That's a great dataset!

One that I found interesting when I took this course was the Global Terrorism Database, which has tons of information about over 100,000 terrorism acts worldwide since 1970.

I hope more students share cool datasets they find, good idea starting this issue

jkbooth commented 9 years ago

Here is some excellent (and already reasonably tidy) NHL player stats data. There is some fun to be had combining the data for each player, which is currently in several excel workbooks.

andresesanch commented 9 years ago

I think a good data set to look into is the airline on-time performance data available by years here. Considering that, for example, the data for 2007 is over 100MB, It would help illustrate the problems that arise when exploring larger data sets.

ChiaraDG commented 9 years ago

An interesting dataset can be found in the MovieLens website. The dataset collects movie preference and ratings. Three dataset of different size are available (100K, 1M, 10M ratings). There will be some merging and cleaning to do since the data are saved in different documents.

BernhardKonrad commented 9 years ago

Here is a dataset on the current Ebola outbreak.

jennybc commented 9 years ago

Here's a massive list of datasets. I've got others like this up my sleeve that I need to produce here.

NYU Health Sciences Library Data Catalog

jennybc commented 9 years ago

A bitty bundle of research quality datasets by Hilary Mason

https://bitly.com/bundles/hmason/1

jennybc commented 9 years ago

100+ Interesting Data Sets for Statistics

http://rs.io/100-interesting-data-sets-for-statistics/

jennybc commented 9 years ago

The home of the U.S. Government’s open data

https://www.data.gov

jennybc commented 9 years ago

1001 DATASETS AND DATA REPOSITORIES ( LIST OF LISTS OF LISTS )

https://dreamtolearn.com/doc/2HDNJH3XJU6CVGKZ7SDM4MCSW

jennybc commented 9 years ago

this blog announces obsessively-detailed instructions to analyze publicly-available survey data with free tools - the r language, the survey package, and (for big data) sqlsurvey + monetdb.

http://www.asdfree.com/p/about-faq.html

jennybc commented 9 years ago

See the "data repositories" section of the visualization design resources curated by the InfoVis Group in UBC Computer Science

jennybc commented 9 years ago

Some public data sources gathered by the Data Incubator:

http://blog.thedataincubator.com/2014/10/data-sources-for-cool-data-science-projects-part-1/

jennybc commented 9 years ago

You can get data on the Ebola outbreak from DataMarket:

https://blog.datamarket.com/2014/10/15/ebola-data-on-datamarket/