UVA-DSI / Open-Data-Lab

an initiative to provide infrastructure for reproducible workflows around open data
GNU General Public License v3.0
27 stars 65 forks source link

Identify datasets for potential inclusion in the ODL #28

Open Daniel-Mietchen opened 5 years ago

Daniel-Mietchen commented 5 years ago

One way to start looking into this would be to check open resources like

On that basis, we could then decide (see also the inclusion criteria in ODL, as per #18 ) as to whether we'd like to go for datasets scoring high and/or low / average on those scales.

Daniel-Mietchen commented 5 years ago

Another potential candidate: http://retractiondatabase.org/ — described by some as "antediluvian".

Daniel-Mietchen commented 5 years ago

Another one: https://orcid.org/blog/2018/10/24/2018-public-data-file .

Daniel-Mietchen commented 5 years ago

Datasets and code involved in projects for which there is a bug bounty, e.g. https://rubenarslan.github.io/posts/2018-10-26-on-making-mistakes-and-my-bug-bounty-program/ .

Daniel-Mietchen commented 5 years ago

allofplos, as per https://github.com/PLOS/allofplos

Daniel-Mietchen commented 5 years ago

https://doi.org/10.5061%2Fdryad.n5g39d7 - & mdash; probably the most comprehensive public dataset about Hemimastigophora to date

Daniel-Mietchen commented 5 years ago

"Teaching data science with real world datasets" https://twitter.com/emcandre/status/1068139908836012032

Daniel-Mietchen commented 5 years ago

Gaia star catalog data, as per http://sci.esa.int/gaia/60192-gaia-creates-richest-star-map-of-our-galaxy-and-beyond/

Daniel-Mietchen commented 5 years ago

Here is some inspiration from the kinds of data and related services hosted at IDigInfo's data portal: