HackerCollective / resources

Collective dump of resources.
MIT License
159 stars 45 forks source link

Add a script to check for broken links #54

Closed jamietanna closed 9 years ago

jamietanna commented 9 years ago

Reports the links that can't be found any more, so they can be amended/removed.

frankcash commented 9 years ago

Wondering if bash is the way to go, I could probably write a Python script with some threading to save us some time. What are your thoughts @bryangarza ?

bryangarza commented 9 years ago

Maybe generate the list of links with this bash script, and then call a Python/other script for the HTTP requests. The bottleneck is the HTTP requests, right? How fast does this script find all of the links, @jamietanna?

edit: Thanks for making this!

frankcash commented 9 years ago

That is a good suggestion too, but as far as I'm concerned I'd rather keep it contained to one script (i.e. tests.py or test.hs instead of checklinks.sh which calls foo.py) . We could then have TravisCI run it too or something. @jamietanna @bryangarza @bltsandwich1

Once again, thanks for making this!

On Wed, Oct 7, 2015 at 10:43 AM, Bryan Garza notifications@github.com wrote:

Maybe generate the list of links with this bash script, and then call a Python/other script for the HTTP requests. The bottleneck is the HTTP requests, right? How fast does this script find all of the links, @jamietanna https://github.com/jamietanna?

— Reply to this email directly or view it on GitHub https://github.com/HackerCollective/resources/pull/54#issuecomment-146216372 .

Charles Frank Cash FOSS Programmer Junior Software Engineer at Star2Star (941)928-4242 cashc@acm.org https://github.com/frankcash https://keybase.io/frankcash https://github.com/frankcash

jamietanna commented 9 years ago

@bryangarza it instantly returns the list of URLs, so then it's really the HTTP requests as you say - which for me takes about 10 minutes, but can change depending on your connection. Threading could be a good idea @frankcash, I'd be happy to whip something up to do this instead of a Bash script - also means it'll be way easier for people to understand in the future.

There are some false positives - potentially due to cURL's UA, although I've not looked into it in huge depth. May mess up with Travis if we're unable to reduce them, but otherwise that's a great shout.

No worries, I'm glad to be able to help out!

frankcash commented 9 years ago

Yeah, I'm going to say we need to use one standardized language. If you want to take this and do it in python, elixir, golang, or haskell I don't really care what you choose. If not I can do it when I get around to it.

frankcash commented 9 years ago

I created an issue for this.

jamietanna commented 9 years ago

@frankcash @bryangarza I've implemented a multithreaded Python2 script to do this - if you're able to have a look and let me know if there are any changes needed, I'd appreciate it!

jamietanna commented 9 years ago

I can see so far that the following URLs are returning an invalid status code:

URL Error Code
http://truepcgaming.com 403
http://intelfinity.com/Android%20Dev%20CHeat%20Sheet.pdf 404
https://wiki.videolan.org/LibVLC 404
frankcash commented 9 years ago

That's weird I checked those links and they all seem active still, but thank you for your work I appreciate it :100: