awesomedata / awesome-public-datasets

A topic-centric list of HQ open datasets.
https://awesomedataworld.slack.com
MIT License
61.15k stars 9.94k forks source link

Requests for new public datasets contributions. (PR preferred) #1

Open caesar0301 opened 9 years ago

abetusk commented 9 years ago

The Personal Genome Project (http://www.personalgenomes.org/ and https://my.pgp-hms.org/public_genetic_data) 1000 Genomes (http://www.1000genomes.org/ and http://www.1000genomes.org/data) UCSC Public Data (http://hgdownload.soe.ucsc.edu/downloads.html)

caesar0301 commented 9 years ago

Added! A Pull Request is encouraged to record your kindly contribution. :+1:

surrealcristian commented 9 years ago

Are you interested in add some links from Argentina's government?

caesar0301 commented 9 years ago

Can you make a Pull Request about your data?

surrealcristian commented 9 years ago

Of course. Today at night i'll send it.

2014-12-12 12:41 GMT-03:00 Xiaming notifications@github.com:

Can you make a Pull Request about your data?

— Reply to this email directly or view it on GitHub https://github.com/caesar0301/awesome-public-datasets/issues/1#issuecomment-66788902 .

zippeurfou commented 9 years ago

What about soccer? There are lot of sources for it.

caesar0301 commented 9 years ago

Nice source. Added under Sport. :+1:

westurner commented 9 years ago

Pandas Remote Data DataFrame API wrappers: http://pandas.pydata.org/pandas-docs/dev/remote_data.html

  • Yahoo! Finance
  • Google Finance
  • St. Louis FED (FRED)
  • Kenneth French’s data library
  • World Bank
dettmering commented 9 years ago

Transcriptions of all debates in the German government as txt files: http://www.bundestag.de/plenarprotokolle

rtbarber commented 9 years ago

U.S. Department of Education:

LGInform commented 9 years ago

Hi, would you be able to add LG Inform to your awesume-public-datasets. It holds publically available data about local authorities and fire and rescue services in England - http://lginform.local.gov.uk/search Thanks Alex

caesar0301 commented 9 years ago

@rtbarber NCES added! LGInform added!

LGInform commented 9 years ago

Great thanks, have a good day

Kind Regards

Alex

From: Xiaming [mailto:notifications@github.com] Sent: 20 April 2015 09:10 To: caesar0301/awesome-public-datasets Cc: Alexandra Marshall Subject: Re: [awesome-public-datasets] Requests for new public datasets contributions. (#1)

@rtbarberhttps://github.com/rtbarber NCES added! LGInform added!

— Reply to this email directly or view it on GitHubhttps://github.com/caesar0301/awesome-public-datasets/issues/1#issuecomment-94388374.

This email may include confidential information and is solely for use by the intended recipient(s). If you have received this email in error please notify the sender immediately. You must not disclose, copy, distribute or retain any part of the email message or attachments. No responsibility will be assumed by the LGA for any direct or consequential loss, financial or otherwise, damage or inconvenience, or any other obligation or liability incurred by readers relying on information contained in this email. Views and opinions expressed by the author are not necessarily those of the organisation nor should they be treated, where cited, as an authoritative statement of the law, and independent legal and other professional advice should be obtained as appropriate.

Visit the Local Government Association website – www.local.gov.uk

JEFworks commented 9 years ago

Some additional biology-related public datasets worth considering:

ExAC - http://exac.broadinstitute.org/ (exome sequencing data for 60,706 unrelated individuals, including 1000 genomes) OMIM - http://www.omim.org/ (database of phenotype-genotype relationships) dbSNP - http://www.ncbi.nlm.nih.gov/SNP/ (database of phenotype-genotype relationships) dbGAP - http://www.ncbi.nlm.nih.gov/gap (database of phenotype-genotype relationships)

PanArnaud commented 9 years ago

A French flora recognition system : http://identify.plantnet-project.org/en/

znurgl commented 9 years ago

@PanArnaud Where is the public dataset on this page?

PanArnaud commented 9 years ago

It's a search engine. That may be not appropriate ... http://identify.plantnet-project.org/en/base/tree

znurgl commented 9 years ago

That's not a dataset. You can't download it as a CSV (for example) or access it via public API.

PanArnaud commented 9 years ago

I understand. Sorry for the inconvenience

Xaviju commented 9 years ago

Obvious internet stuff: http://thecatapi.com

LauR3y commented 9 years ago

Belgium also has open data: http://data.gov.be/

cofiem commented 9 years ago

The Macaulay Library: archive of wildlife sounds and videos http://macaulaylibrary.org/

caesar0301 commented 9 years ago

@cofiem It seems that these data are not free?

caesar0301 commented 9 years ago

Partially free for some datasets.

cofiem commented 9 years ago

@caesar0301 Unfortunately yes, you're right, the data are not free nor in a machine readable form as far as I can see :disappointed:

gabriel-almeida commented 9 years ago

I found this collection of datasets of (Context-Aware) Recommender Systems. http://students.depaul.edu/~yzheng8/DataSets.html

Maybe its a good idea to talk to the author before publish it.

caesar0301 commented 9 years ago

I have reached the author to grant permission. He said Yes. I will merge this cat into list manually.

vipints commented 9 years ago

Thanks for the detailed list of many awesome datasets! few missing good data source from biology side: GTEx http://www.gtexportal.org/ ESP(Exome Sequencing Project) https://esp.gs.washington.edu/drupal/ ExAC(Exome Aggregation Consortium) http://exac.broadinstitute.org/ UK10K http://www.uk10k.org/

wumpus commented 8 years ago

I see you have the Internet Archive's ArchiveIt! service listed as a search engine, it's really a self-serve web archiver.

Other Internet Archive datasets: https://openlibrary.org/developers/dumps -- metadata for books

danfruehauf commented 8 years ago

Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements: https://imos.aodn.org.au

Or directly the on the S3 bucket: http://imos-data.s3-website-ap-southeast-2.amazonaws.com/

zippeurfou commented 8 years ago

There is a nice quora topic about it where you could find other sources as well.

scriptin commented 8 years ago

Hi there! I have a data about Japanese kanji usage frequency, also available as a user-friendly page. Does it satisfy the requirements?

sitsofe commented 8 years ago

Storage block traces (OSI licensed).

caesar0301 commented 8 years ago

@wumpus Thanks for ur suggestion. The archives may fit the PublicDomains category. The OL Dump also added.

caesar0301 commented 8 years ago

@ danfruehauf IMOS added! :+1:

thomasbrand commented 8 years ago

Hello,

An international economic database is being built here : http://widukind.cepremap.org/ and all the source code (with python, R client) is available here : https://github.com/Widukind.

Thanks!

certifiedloud commented 8 years ago

Aviation weather data: https://aviationweather.gov/adds/dataserver

DataStrategist commented 8 years ago

Is this project still alive? Two big ones:

caesar0301 commented 8 years ago

Added!