Closed jamietanna closed 9 years ago
Wondering if bash is the way to go, I could probably write a Python script with some threading to save us some time. What are your thoughts @bryangarza ?
Maybe generate the list of links with this bash script, and then call a Python/other script for the HTTP requests. The bottleneck is the HTTP requests, right? How fast does this script find all of the links, @jamietanna?
edit: Thanks for making this!
That is a good suggestion too, but as far as I'm concerned I'd rather keep
it contained to one script (i.e. tests.py
or test.hs
instead of
checklinks.sh
which calls foo.py
) . We could then have TravisCI run it
too or something. @jamietanna @bryangarza @bltsandwich1
Once again, thanks for making this!
On Wed, Oct 7, 2015 at 10:43 AM, Bryan Garza notifications@github.com wrote:
Maybe generate the list of links with this bash script, and then call a Python/other script for the HTTP requests. The bottleneck is the HTTP requests, right? How fast does this script find all of the links, @jamietanna https://github.com/jamietanna?
— Reply to this email directly or view it on GitHub https://github.com/HackerCollective/resources/pull/54#issuecomment-146216372 .
Charles Frank Cash FOSS Programmer Junior Software Engineer at Star2Star (941)928-4242 cashc@acm.org https://github.com/frankcash https://keybase.io/frankcash https://github.com/frankcash
@bryangarza it instantly returns the list of URLs, so then it's really the HTTP requests as you say - which for me takes about 10 minutes, but can change depending on your connection. Threading could be a good idea @frankcash, I'd be happy to whip something up to do this instead of a Bash script - also means it'll be way easier for people to understand in the future.
There are some false positives - potentially due to cURL's UA, although I've not looked into it in huge depth. May mess up with Travis if we're unable to reduce them, but otherwise that's a great shout.
No worries, I'm glad to be able to help out!
Yeah, I'm going to say we need to use one standardized language. If you want to take this and do it in python, elixir, golang, or haskell I don't really care what you choose. If not I can do it when I get around to it.
@frankcash @bryangarza I've implemented a multithreaded Python2 script to do this - if you're able to have a look and let me know if there are any changes needed, I'd appreciate it!
I can see so far that the following URLs are returning an invalid status code:
URL | Error Code |
---|---|
http://truepcgaming.com | 403 |
http://intelfinity.com/Android%20Dev%20CHeat%20Sheet.pdf | 404 |
https://wiki.videolan.org/LibVLC | 404 |
That's weird I checked those links and they all seem active still, but thank you for your work I appreciate it :100:
Reports the links that can't be found any more, so they can be amended/removed.