awesomedata / awesome-public-datasets

A topic-centric list of HQ open datasets.
https://awesomedataworld.slack.com
MIT License
59.24k stars 9.76k forks source link

Add Enigma Public & archive.org's data #304

Closed eveah closed 6 years ago

eveah commented 6 years ago

Overview

tfmorris commented 6 years ago

My original main comment (and the reason for the thumbs down) seems to have gotten lost in the ether. To reiterate:

While "enigma" is a pretty browser and aggregator, it doesn't qualify for inclusion as an awesome dataset because -- wait for it -- it's not a dataset!

The fact that the archive.org entry got duplicated and re-alphabetizing the Internet Archive entries in the first place is a silly waste of time is just a distraction to the main issue. Enigma has no place on the list.

eveah commented 6 years ago

Hi @tfmorris, thanks for the clarification on the line issue. I still have objection to your main point - you could say the same about the majority of the listings in the Public Domain category - certainly true of the Amazon Public Datasets link, Reddit Datasets thread, Datamob when it existed, Google public data directory, etc. Additionally true of a few of the listings in the Search Engine category, such as datahub.io, or Quandl in the Finance category. The list contains many links to both singular datasets and links to points of access to multiple datasets. That's what makes it awesome. Why the distinction here?

eveah commented 6 years ago

Hey @caesar0301, would you be able to chime in here? I'm unclear on the distinction of why my submission is out of place whereas the definitely awesome Amazon Public Datasets, Reddit datasets, data.world, etc, do belong?

If you think it could indeed fit in, I'm happy to open another PR without the separate addition of a the Internet Archive data-specific link.

caesar0301 commented 6 years ago

Hi @eveah , Archive.org -like items are a nice shot for this awesome list, because they addresses a real painful requirement about data loss in future as discussed in #262.

I also reviewed the "enigma" project. It is actually a similar project like awesome data. However, it seems still under construction or more data sources are banned by the account registration. I suggest it should open its registration before entering the public open data domain. Thanks for your keen contribution.

eveah commented 6 years ago

OK, thanks for the review and the response @caesar0301!

Just one clarifying question: what do you mean by data sources banned by the account registration? You don't need to register to engage with any of the datasets. You do need to register to use the API - but that is the same as data.world which is in the Public Domains section...

caesar0301 commented 6 years ago

Hi @eveah , I went through the candidate site again. and reviewed some declaration of Enigma Public. From my view, they are taking a promising and encouraging effort to let data be more useful to us mankind. I said banned, because in my first review, both clicking dig deeper and downloading a specific dataset gave me a login request.

I also found Enigma Public actually held a large portion of datasets under the search engine. It deserves entering the Public Domain list. Could you give another new MR to add this item? Thanks.

@tfmorris I think we should be more open on the Public Domain section. This doesn't hurt our initial efforts to help people find high-quality data as easy as possible. Meanwhile, we should always take alert to reject ads spam post to keep this section clean.

eveah commented 6 years ago

Thank you @caesar0301! Opened PR #348!