awesomedata / awesome-public-datasets

A topic-centric list of HQ open datasets.
https://awesomedataworld.slack.com
MIT License
59.54k stars 9.79k forks source link

Validate pull requests with Travis #130

Closed awesome-bot closed 8 years ago

awesome-bot commented 8 years ago

Hello, I wrote a tool that can validate README links (valid URLs, not duplicate). It can be run when someone submits a pull request.

It is currently being used by

Examples

If you are interested, connect this repo to https://travis-ci.org/ and add a .travis.yml file to the project.

See https://github.com/dkhamsing/awesome_bot for options, more information Feel free to leave a comment :smile:

caesar0301 commented 8 years ago

Thanks @awesome-bot ! I also added it to my anther project pcaptools.

awesome-bot commented 8 years ago

Cool! I reviewed the results and there are some issues.. Some links may have to be white listed, let me know if you have any questions

caesar0301 commented 8 years ago

I added some links with issues into the white list. But I found the bot returned inconsistent results when I run travis multiple times. And most are failure. Some links reported as failures are accessible when I check them manually. I donnt know why but am confused.

awesome-bot commented 8 years ago

Ok let me take a look

awesome-bot commented 8 years ago

So I am reviewing https://travis-ci.org/caesar0301/awesome-public-datasets/builds/98612497 which is actually passing :white_check_mark:

Is that not ok?

Sites that are returning 404 should actually be removed from the README

  1. 404 http://lib.stat.cmu.edu/datasets/
  2. 404 http://archive.org/details/2011-05-calufa-twitter-sql
  3. 404 http://www.datawrangling.com/some-datasets-available-on-the-web
  4. 404 http://www.stats4stem.org/data-sets.html

Other ones should be white listed if you know they do work but somehow the bot is misreporting them

Does that make sense?

caesar0301 commented 8 years ago

Thanks for your checking. I still can not explain my previous observations, maybe resulting from instability of networks of Travis services. I will keep using awesome_bot and feed back to you if there are any other problems.

I think its better to keep these 404 links in the list, because I am not sure they are removed permanently or temporally. Even more, others may find alternative sources for these 404 data.

awesome-bot commented 8 years ago

Cool FYI, I think 404 or permanent but yeah maybe the projects still exist, just at a different URL.. https://en.m.wikipedia.org/wiki/HTTP_404