Closed jjtroberts closed 7 years ago
There is an internal validation to only visit http and https links, but the tel: and probably mailto: are not correctly identified so the current domain is prepended to the tel link and pylinkvalidator then believes that it's a normal http link.
Shouldn't be to hard to fix. Thanks for reporting this bug!
I take back what I said, this is already supported in pylinkvalidator: tel and mailto links should not be crawled. Which version of pylinkvalidator did you try? There is even a unit test with a mailto and tel link. I saw you also opened an issue on pylinkchecker, but the two codebases are now quite different.
I was using pylinkchecker until I opened the issue there and started reading some of the comments on other issues. That's how I found your fork. I'm using version 0.3 of pylinkvalidate:
[root@plgdinfra02 scripts]# pylinkvalidator/pylinkvalidator/bin/pylinkvalidate.py --version
pylinkvalidate.py 0.3
Here's what it outputs:
pylinkvalidate.py -P -N -w 5 http://www.theosborn.org
Starting crawl...
200 - http://www.theosborn.org (1 of 88 - 1%)
404 - http://www.theosborn.org/tel:18006732926 (2 of 88 - 2%)
404 - http://www.theosborn.org/tel:19149258000 (3 of 88 - 3%)
404 - http://www.theosborn.org/tel:18007216695 (4 of 88 - 5%)
404 - http://www.theosborn.org/tel:18002524793 (5 of 88 - 6%)
404 - http://www.theosborn.org/tel:18005108895 (6 of 88 - 7%)
404 - http://www.theosborn.org/tel:12032921546 (7 of 88 - 8%)
200 - http://www.theosborn.org/about/ (8 of 88 - 9%)
200 - http://www.theosborn.org/news/ (9 of 88 - 10%)
200 - http://www.theosborn.org/events/ (10 of 88 - 11%)
200 - http://www.theosborn.org/careers/ (11 of 88 - 12%)
200 - http://www.theosborn.org/giving/ (12 of 88 - 14%)
200 - http://www.theosborn.org/westchester-county-retirement-community/ (13 of 88 - 15%)
200 - http://www.theosborn.org/location/ (14 of 88 - 16%)
200 - http://www.theosborn.org/activities/ (15 of 88 - 17%)
200 - http://www.theosborn.org/dining/ (16 of 88 - 18%)
200 - http://www.theosborn.org/faq/ (17 of 88 - 19%)
200 - http://www.theosborn.org/map/ (18 of 88 - 20%)
200 - http://www.theosborn.org/testimonials/ (19 of 88 - 22%)
200 - http://www.theosborn.org/miriams-attic/ (20 of 88 - 23%)
200 - http://www.theosborn.org/westchester-county-senior-apartments/ (21 of 88 - 24%)
200 - http://www.theosborn.org/westchester-county-independent-living/ (22 of 88 - 25%)
200 - http://www.theosborn.org/westchester-county-senior-housing/ (23 of 88 - 26%)
200 - http://www.theosborn.org/westchester-county-senior-care/ (24 of 88 - 27%)
200 - http://www.theosborn.org/westchester-county-assisted-living/ (25 of 88 - 28%)
200 - http://www.theosborn.org/westchester-county-memory-care/ (26 of 88 - 30%)
200 - http://www.theosborn.org/westchester-county-skilled-nursing/ (27 of 88 - 31%)
200 - http://www.theosborn.org/westchester-county-rehabilitation/ (28 of 88 - 32%)
200 - http://www.theosborn.org/westchester-county-hospice/ (29 of 88 - 33%)
200 - http://www.theosborn.org/home-care/ (30 of 88 - 34%)
200 - http://www.theosborn.org/westchester-county-respite/ (31 of 88 - 35%)
200 - http://www.theosborn.org/home-care-westchester-county-ny/ (32 of 88 - 36%)
200 - http://www.theosborn.org/home-care-fairfield-county-ct/ (33 of 88 - 38%)
200 - http://www.theosborn.org/leadership/ (34 of 88 - 39%)
200 - http://www.theosborn.org/resources/ (35 of 88 - 40%)
200 - http://www.theosborn.org/ohc-faq/ (36 of 88 - 41%)
200 - http://www.theosborn.org/gallery/ (37 of 88 - 42%)
200 - http://www.theosborn.org/gallery/photo-gallery/ (38 of 88 - 43%)
200 - http://www.theosborn.org/gallery/virtual-tour/ (39 of 88 - 44%)
200 - http://www.theosborn.org/contact-us/ (40 of 88 - 45%)
200 - http://www.theosborn.org/directions/ (41 of 88 - 47%)
200 - http://www.theosborn.org/privacy-policy/ (42 of 88 - 48%)
200 - http://www.theosborn.org/ (43 of 88 - 49%)
200 - http://www.theosborn.org/westchester-county-assisted-living-apartments/ (44 of 88 - 50%)
200 - http://www.theosborn.org/accreditation/ (45 of 88 - 51%)
200 - http://www.theosborn.org/history/ (46 of 88 - 52%)
200 - http://www.theosborn.org/event/summer-outdoor-concert-series-tuesdays-7-pm/ (47 of 88 - 53%)
200 - http://www.theosborn.org/contact-us-download/ (48 of 88 - 55%)
200 - http://www.theosborn.org/gallery/photo-gallery/ (48 of 87 - 55%)
200 - http://www.theosborn.org/additional-services/ (49 of 87 - 56%)
200 - http://www.theosborn.org/scholarship/ (50 of 87 - 57%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/the-osborn.jpg (51 of 87 - 59%)
200 - http://www.theosborn.org/wp-content/uploads/2013/01/osb-home_new.jpg (52 of 87 - 60%)
200 - http://www.theosborn.org/wp-content/uploads/2013/01/osb-gallerycallout_new.jpg (53 of 87 - 61%)
200 - http://www.theosborn.org/wp-content/uploads/2013/01/broshures.jpg (54 of 87 - 62%)
404 - http://www.theosborn.org/tel:9149258000 (55 of 87 - 63%)
200 - http://www.theosborn.org/event/families-managing-dementia-related-decline/ (56 of 87 - 64%)
200 - http://www.theosborn.org/sitemap/ (57 of 87 - 66%)
200 - http://www.theosborn.org/wp-content/uploads/2014/10/FB-f-Logo__white_50.png (58 of 87 - 67%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/ada.png (59 of 87 - 68%)
200 - http://www.theosborn.org/privacy-policy/ (59 of 86 - 69%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/eho.png (60 of 86 - 70%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/carf-ccac.png (61 of 86 - 71%)
200 - http://www.theosborn.org/wp-content/plugins/gd-worker/public/js/gd-worker-public.js?ver=1.0.0 (62 of 86 - 72%)
200 - http://www.theosborn.org/wp-content/plugins/gravity-forms-auto-placeholders/modernizr.placeholder.min.js?ver=1.2 (63 of 86 - 73%)
200 - http://www.theosborn.org/wp-content/plugins/gravity-forms-auto-placeholders/scripts.js?ver=1.2 (64 of 86 - 74%)
200 - http://www.theosborn.org/wp-content/plugins/slide-in/js/wdsi.js?ver=1.2 (65 of 86 - 76%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.matchHeight-min.js?ver=1.0 (66 of 86 - 77%)
200 - http://www.theosborn.org/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.2.1 (67 of 86 - 78%)
200 - http://www.theosborn.org/wp-includes/js/jquery/ui/core.min.js?ver=1.11.4 (68 of 86 - 79%)
200 - http://www.theosborn.org/wp-includes/js/jquery/ui/widget.min.js?ver=1.11.4 (69 of 86 - 80%)
200 - http://www.theosborn.org/wp-includes/js/jquery/ui/accordion.min.js?ver=1.11.4 (70 of 86 - 81%)
200 - http://www.theosborn.org/wp-includes/js/wp-embed.min.js?ver=ffa3821e3d07f071d4c8934f4e0a1c62 (71 of 86 - 83%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.flexslider-min.js (72 of 86 - 84%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.magnific-popup.min.js (73 of 86 - 85%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/osborn.js (74 of 86 - 86%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/favicon.ico (75 of 86 - 87%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/styles/magnific-popup.css (76 of 86 - 88%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/style.css?v=2.0 (77 of 86 - 90%)
200 - http://www.theosborn.org/wp-content/plugins/gd-worker/public/css/gd-worker-public.css?ver=1.0.0 (78 of 86 - 91%)
200 - http://www.theosborn.org/wp-content/plugins/slide-in/css/wdsi.css?ver=1.2 (79 of 86 - 92%)
200 - http://www.theosborn.org/wp-content/plugins/ultimate-branding/ultimate-branding-files/modules/custom-admin-bar-files/css/general.css?ver=1.0 (80 of 86 - 93%)
200 - http://www.theosborn.org/wp-content/plugins/ultimate-branding/ultimate-branding-files/modules/favicons/css/admin.css?ver=1.0.0 (81 of 86 - 94%)
200 - http://www.theosborn.org/wp-content/themes/the-osborn/styles/flexslider.css (82 of 86 - 95%)
200 - http://www.theosborn.org/wp-includes/wlwmanifest.xml (83 of 86 - 97%)
200 - http://www.theosborn.org/wp-content/uploads/2015/10/door-knob_O_32x32.png?87261ec6721344e609568fab5cba4fbd (84 of 86 - 98%)
200 - http://www.theosborn.org/wp-json/ (85 of 86 - 99%)
200 - http://www.theosborn.org/xmlrpc.php?rsd (86 of 86 - 100%)
Crawling Done...
ERROR Crawled 86 urls with 7 error(s) in 14.37 seconds
average response time: 0.69 seconds
average process time: 0.29 seconds
Start URL(s): http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18007216695
from http://www.theosborn.org
from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18006732926
from http://www.theosborn.org
from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:19149258000
from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:9149258000
from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18005108895
from http://www.theosborn.org
from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18002524793
from http://www.theosborn.org
from http://www.theosborn.org
from http://www.theosborn.org
from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:12032921546
from http://www.theosborn.org
from http://www.theosborn.org
Thanks a lot for the detailed bug report. I see the problem now. The unit test has a wrong tel link (tel:foo@bar.com instead of tel:1234567890) and Python urlsplit does not correctly parse the real tel link.
After further research, tel:1203292154 is not a valid tel URI. It should be tel:+1203292154 (and in that case, it would correctly be parsed by Python and ignored by pylinkvalidator).
Browsers usually interpret these URIs as tel URI even though they are malformed. I could thus add an option to try to detect them.
Adding an option to ignore tel: and mailto: would be helpful. I doubt I could convince my producers to go back through all of our client sites (200+) and add a "+1" to each tel: value.
@gd-jroberts can you try the latest commit to see if it fixes your issue? I added an option, -b
(or --ignore-bad-tel-urls
) that ignores badly formed tel URLs in the unit tests, but a real-world test would be even better.
Interestingly, it seems that Python 2.6 urlparse function recognized all types of tel: URLs (e.g., tel:1234567890 and tel:+1234567890), but it was "fixed" in Python 2.7.
Sorry for missing your last update:
I cloned master, ran setup.py install
and executed the same command as last time with the following results:
`$ ./pylinkvalidate.py -P -N -w 5 http://www.theosborn.org Starting crawl... 200 - http://www.theosborn.org (1 of 91 - 1%) 404 - http://www.theosborn.org/tel:19149258000 (2 of 91 - 2%) 404 - http://www.theosborn.org/tel:18002524793 (3 of 91 - 3%) 404 - http://www.theosborn.org/tel:18005108895 (4 of 91 - 4%) 404 - http://www.theosborn.org/tel:18007216695 (5 of 91 - 5%) 404 - http://www.theosborn.org/tel:18006732926 (6 of 91 - 7%) 404 - http://www.theosborn.org/tel:12032921546 (7 of 91 - 8%) 404 - http://www.theosborn.org/tel:18008500196 (8 of 91 - 9%) 200 - http://www.theosborn.org/about/ (9 of 91 - 10%) 200 - http://www.theosborn.org/events/ (10 of 91 - 11%) 200 - http://www.theosborn.org/news/ (11 of 91 - 12%) 200 - http://www.theosborn.org/giving/ (12 of 91 - 13%) 200 - http://www.theosborn.org/careers/ (13 of 91 - 14%) 200 - http://www.theosborn.org/activities/ (14 of 91 - 15%) 200 - http://www.theosborn.org/location/ (15 of 91 - 16%) 200 - http://www.theosborn.org/westchester-county-retirement-community/ (16 of 91 - 18%) 200 - http://www.theosborn.org/dining/ (17 of 91 - 19%) 200 - http://www.theosborn.org/miriams-attic/ (18 of 91 - 20%) 200 - http://www.theosborn.org/faq/ (19 of 91 - 21%) 200 - http://www.theosborn.org/testimonials/ (20 of 91 - 22%) 200 - http://www.theosborn.org/map/ (21 of 91 - 23%) 200 - http://www.theosborn.org/westchester-county-senior-apartments/ (22 of 91 - 24%) 200 - http://www.theosborn.org/westchester-county-senior-housing/ (23 of 91 - 25%) 200 - http://www.theosborn.org/westchester-county-independent-living/ (24 of 91 - 26%) 200 - http://www.theosborn.org/westchester-county-senior-care/ (25 of 91 - 27%) 200 - http://www.theosborn.org/westchester-county-memory-care/ (26 of 91 - 29%) 200 - http://www.theosborn.org/westchester-county-skilled-nursing/ (27 of 91 - 30%) 200 - http://www.theosborn.org/westchester-county-assisted-living/ (28 of 91 - 31%) 200 - http://www.theosborn.org/westchester-county-senior-care/westchester-county-rehabilitation/ (29 of 91 - 32%) 200 - http://www.theosborn.org/westchester-county-hospice/ (30 of 91 - 33%) 200 - http://www.theosborn.org/westchester-county-respite/ (31 of 91 - 34%) 200 - http://www.theosborn.org/home-care/ (32 of 91 - 35%) 200 - http://www.theosborn.org/home-care-westchester-county-ny/ (33 of 91 - 36%) error - http://www.theosborn.org/home-care-fairfield-county-ct/ (34 of 91 - 37%) error - http://www.theosborn.org/leadership/ (35 of 91 - 38%) error - http://www.theosborn.org/resources/ (36 of 91 - 40%) error - http://www.theosborn.org/ohc-faq/ (37 of 91 - 41%) error - http://www.theosborn.org/gallery/ (38 of 91 - 42%) error - http://www.theosborn.org/gallery/photo-gallery/ (39 of 91 - 43%) error - http://www.theosborn.org/gallery/virtual-tour/ (40 of 91 - 44%) error - http://www.theosborn.org/contact-us/ (41 of 91 - 45%) error - http://www.theosborn.org/directions/ (42 of 91 - 46%) error - http://www.theosborn.org/privacy-policy/ (43 of 91 - 47%) error - http://www.theosborn.org/ (44 of 91 - 48%) error - http://www.theosborn.org/westchester-county-assisted-living-apartments/ (45 of 91 - 49%) error - http://www.theosborn.org/history/ (46 of 91 - 51%) error - http://www.theosborn.org/accreditation/ (47 of 91 - 52%) error - http://www.theosborn.org/scholarship/ (48 of 91 - 53%) error - http://www.theosborn.org/2016/10/19/matt-anderson-presents-wellspring-osborn/ (49 of 91 - 54%) error - http://www.theosborn.org/event/alexandra-zapruder-author-twenty-six-seconds-personal-history-zapruder-film/ (50 of 91 - 55%) error - http://www.theosborn.org/photo-gallery/ (51 of 91 - 56%) error - http://www.theosborn.org/contact-us-download/ (52 of 91 - 57%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/the-osborn.jpg (53 of 91 - 58%) error - http://www.theosborn.org/additional-services/ (54 of 91 - 59%) 200 - http://www.theosborn.org/wp-content/uploads/2013/01/Homepage3.jpg (55 of 91 - 60%) 200 - http://www.theosborn.org/wp-content/uploads/2013/01/osb-gallerycallout_new.jpg (56 of 91 - 62%) 200 - http://www.theosborn.org/wp-content/uploads/2013/01/broshures.jpg (57 of 91 - 63%) 200 - http://www.theosborn.org/wp-content/uploads/2014/10/FB-f-Logo__white_50.png (58 of 91 - 64%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/ada.png (59 of 91 - 65%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/eho.png (60 of 91 - 66%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/carf-ccac.png (61 of 91 - 67%) 200 - http://www.theosborn.org/wp-content/plugins/gd-worker/public/js/gd-worker-public.js?ver=1.0.0 (62 of 91 - 68%) 200 - http://www.theosborn.org/wp-content/plugins/gravity-forms-auto-placeholders/modernizr.placeholder.min.js?ver=1.2 (63 of 91 - 69%) 200 - http://www.theosborn.org/wp-content/plugins/gravity-forms-auto-placeholders/scripts.js?ver=1.2 (64 of 91 - 70%) 200 - http://www.theosborn.org/wp-content/plugins/slide-in/js/wdsi.js?ver=1.2 (65 of 91 - 71%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.matchHeight-min.js?ver=1.0 (66 of 91 - 73%) 200 - http://www.theosborn.org/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.4.1 (67 of 91 - 74%) 200 - http://www.theosborn.org/wp-includes/js/jquery/ui/core.min.js?ver=1.11.4 (68 of 91 - 75%) 200 - http://www.theosborn.org/wp-includes/js/jquery/ui/widget.min.js?ver=1.11.4 (69 of 91 - 76%) 200 - http://www.theosborn.org/wp-includes/js/jquery/ui/accordion.min.js?ver=1.11.4 (70 of 91 - 77%) 200 - http://www.theosborn.org/wp-includes/js/wp-embed.min.js?ver=c39570c078c67f50cfcafeebaf91152d (71 of 91 - 78%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.flexslider-min.js (72 of 91 - 79%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.magnific-popup.min.js (73 of 91 - 80%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/osborn.js (74 of 91 - 81%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/favicon.ico (75 of 91 - 82%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/styles/flexslider.css (76 of 91 - 84%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/styles/magnific-popup.css (77 of 91 - 85%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/style.css?v=2.0 (78 of 91 - 86%) 200 - http://www.theosborn.org/wp-content/plugins/gd-worker/public/css/gd-worker-public.css?ver=1.0.0 (79 of 91 - 87%) 200 - http://www.theosborn.org/wp-content/plugins/slide-in/css/wdsi.css?ver=1.2 (80 of 91 - 88%) 200 - http://www.theosborn.org/wp-content/plugins/ultimate-branding/ultimate-branding-files/modules/custom-admin-bar-files/css/general.css?ver=1.0 (81 of 91 - 89%) 200 - http://www.theosborn.org/wp-content/plugins/ultimate-branding/ultimate-branding-files/modules/favicons/css/admin.css?ver=1.0.0 (82 of 91 - 90%) error - http://www.theosborn.org/tel:9149258000 (83 of 91 - 91%) 200 - http://www.theosborn.org/wp-includes/wlwmanifest.xml (84 of 91 - 92%) error - http://www.theosborn.org/sitemap/ (85 of 91 - 93%) error - http://www.theosborn.org/contact-us/privacy-policy/ (86 of 91 - 95%) 200 - http://www.theosborn.org/wp-content/uploads/2015/10/door-knob_O_32x32.png (87 of 91 - 96%) error - http://www.theosborn.org/wp-json/ (88 of 91 - 97%) error - http://www.theosborn.org/xmlrpc.php?rsd (89 of 91 - 98%) error - http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F (90 of 91 - 99%) 200 - http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F&format=xml (91 of 91 - 100%) Crawling Done...
ERROR Crawled 91 urls with 33 error(s) in 70.52 seconds
Start URL(s): http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:19149258000 from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/photo-gallery/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/event/alexandra-zapruder-author-twenty-six-seconds-personal-history-zapruder-film/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/contact-us/ from http://www.theosborn.org from http://www.theosborn.org from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18005108895 from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/ from http://www.theosborn.org from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/ohc-faq/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/leadership/ from http://www.theosborn.org from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18007216695 from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/westchester-county-assisted-living-apartments/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/xmlrpc.php?rsd from http://www.theosborn.org
error (timeout): http://www.theosborn.org/resources/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/wp-json/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/additional-services/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/sitemap/ from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18002524793 from http://www.theosborn.org from http://www.theosborn.org from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/contact-us/privacy-policy/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/home-care-fairfield-county-ct/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/history/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/tel:9149258000 from http://www.theosborn.org
error (timeout): http://www.theosborn.org/gallery/ from http://www.theosborn.org from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/gallery/photo-gallery/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/privacy-policy/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/gallery/virtual-tour/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/scholarship/ from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:12032921546 from http://www.theosborn.org from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18006732926 from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/contact-us-download/ from http://www.theosborn.org
not found (404): http://www.theosborn.org/tel:18008500196 from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/2016/10/19/matt-anderson-presents-wellspring-osborn/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/accreditation/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F from http://www.theosborn.org
error (timeout): http://www.theosborn.org/directions/ from http://www.theosborn.org from http://www.theosborn.org`
@gd-jroberts I believe you are missing the --ignore-bad-tel-urls flag
facepalm +1 for following directions.
./pylinkvalidate.py --ignore-bad-tel-urls -P -N -w 5 http://www.theosborn.org Usage: pylinkvalidate.py [options] URL ...
pylinkvalidate.py: error: no such option: --ignore-bad-tel-urls
Did I not build it correctly?
$ git log commit c8a0c6efbe6c351795ad24ed04a0eb6c9bf1e387 Author: Barthelemy Dagenais barthelemy@infobart.com Date: Fri Jun 23 10:34:44 2017 -0400
fixes #16 - added --ignore-bad-tel-urls option
@gd-jroberts Here is a quick way to test without installing the package:
git clone https://github.com/bartdag/pylinkvalidator.git pylinkvalidator-new
cd pylinkvalidator-new
export PYTHONPATH=.
./pylinkvalidator/bin/pylinkvalidate.py -P -N -w 5 --ignore-bad-tel-urls http://www.theosborn.org
I think you are invoking the script from @master but it uses the modules already installed (and not the one in @master).
`$ ./pylinkvalidator/bin/pylinkvalidate.py -P -N -w 5 --ignore-bad-tel-urls http://www.theosborn.org Starting crawl... 200 - http://www.theosborn.org (1 of 83 - 1%) 200 - http://www.theosborn.org/events/ (2 of 83 - 2%) 200 - http://www.theosborn.org/news/ (3 of 83 - 4%) 200 - http://www.theosborn.org/giving/ (4 of 83 - 5%) 200 - http://www.theosborn.org/about/ (5 of 83 - 6%) 200 - http://www.theosborn.org/careers/ (6 of 83 - 7%) 200 - http://www.theosborn.org/westchester-county-retirement-community/ (7 of 83 - 8%) 200 - http://www.theosborn.org/location/ (8 of 83 - 10%) 200 - http://www.theosborn.org/activities/ (9 of 83 - 11%) 200 - http://www.theosborn.org/dining/ (10 of 83 - 12%) 200 - http://www.theosborn.org/map/ (11 of 83 - 13%) 200 - http://www.theosborn.org/faq/ (12 of 83 - 14%) 200 - http://www.theosborn.org/miriams-attic/ (13 of 83 - 16%) 200 - http://www.theosborn.org/westchester-county-independent-living/ (14 of 83 - 17%) 200 - http://www.theosborn.org/testimonials/ (15 of 83 - 18%) 200 - http://www.theosborn.org/westchester-county-senior-apartments/ (16 of 83 - 19%) 200 - http://www.theosborn.org/westchester-county-senior-housing/ (17 of 83 - 20%) 200 - http://www.theosborn.org/westchester-county-senior-care/ (18 of 83 - 22%) 200 - http://www.theosborn.org/westchester-county-assisted-living/ (19 of 83 - 23%) 200 - http://www.theosborn.org/westchester-county-memory-care/ (20 of 83 - 24%) 200 - http://www.theosborn.org/westchester-county-skilled-nursing/ (21 of 83 - 25%) 200 - http://www.theosborn.org/westchester-county-senior-care/westchester-county-rehabilitation/ (22 of 83 - 27%) 200 - http://www.theosborn.org/westchester-county-hospice/ (23 of 83 - 28%) 200 - http://www.theosborn.org/westchester-county-respite/ (24 of 83 - 29%) 200 - http://www.theosborn.org/home-care/ (25 of 83 - 30%) 200 - http://www.theosborn.org/leadership/ (26 of 83 - 31%) 200 - http://www.theosborn.org/home-care-westchester-county-ny/ (27 of 83 - 33%) 200 - http://www.theosborn.org/home-care-fairfield-county-ct/ (28 of 83 - 34%) 200 - http://www.theosborn.org/resources/ (29 of 83 - 35%) 200 - http://www.theosborn.org/gallery/photo-gallery/ (30 of 83 - 36%) 200 - http://www.theosborn.org/ohc-faq/ (31 of 83 - 37%) 200 - http://www.theosborn.org/gallery/virtual-tour/ (32 of 83 - 39%) 200 - http://www.theosborn.org/gallery/ (33 of 83 - 40%) 200 - http://www.theosborn.org/contact-us/ (34 of 83 - 41%) 200 - http://www.theosborn.org/privacy-policy/ (35 of 83 - 42%) 200 - http://www.theosborn.org/directions/ (36 of 83 - 43%) error - http://www.theosborn.org/ (37 of 83 - 45%) error - http://www.theosborn.org/westchester-county-assisted-living-apartments/ (38 of 83 - 46%) error - http://www.theosborn.org/history/ (39 of 83 - 47%) error - http://www.theosborn.org/accreditation/ (40 of 83 - 48%) error - http://www.theosborn.org/scholarship/ (41 of 83 - 49%) error - http://www.theosborn.org/2016/10/19/matt-anderson-presents-wellspring-osborn/ (42 of 83 - 51%) error - http://www.theosborn.org/event/alexandra-zapruder-author-twenty-six-seconds-personal-history-zapruder-film/ (43 of 83 - 52%) error - http://www.theosborn.org/photo-gallery/ (44 of 83 - 53%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/the-osborn.jpg (45 of 83 - 54%) 200 - http://www.theosborn.org/wp-content/uploads/2013/01/Homepage3.jpg (46 of 83 - 55%) error - http://www.theosborn.org/contact-us-download/ (47 of 83 - 57%) error - http://www.theosborn.org/additional-services/ (48 of 83 - 58%) 200 - http://www.theosborn.org/wp-content/uploads/2013/01/osb-gallerycallout_new.jpg (49 of 83 - 59%) 200 - http://www.theosborn.org/wp-content/uploads/2013/01/broshures.jpg (50 of 83 - 60%) 200 - http://www.theosborn.org/wp-content/uploads/2014/10/FB-f-Logo__white_50.png (51 of 83 - 61%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/ada.png (52 of 83 - 63%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/eho.png (53 of 83 - 64%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/images/carf-ccac.png (54 of 83 - 65%) 200 - http://www.theosborn.org/wp-content/plugins/gd-worker/public/js/gd-worker-public.js?ver=1.0.0 (55 of 83 - 66%) 200 - http://www.theosborn.org/wp-content/plugins/gravity-forms-auto-placeholders/modernizr.placeholder.min.js?ver=1.2 (56 of 83 - 67%) 200 - http://www.theosborn.org/wp-content/plugins/slide-in/js/wdsi.js?ver=1.2 (57 of 83 - 69%) 200 - http://www.theosborn.org/wp-content/plugins/gravity-forms-auto-placeholders/scripts.js?ver=1.2 (58 of 83 - 70%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.matchHeight-min.js?ver=1.0 (59 of 83 - 71%) 200 - http://www.theosborn.org/wp-includes/js/jquery/ui/core.min.js?ver=1.11.4 (60 of 83 - 72%) 200 - http://www.theosborn.org/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.4.1 (61 of 83 - 73%) 200 - http://www.theosborn.org/wp-includes/js/jquery/ui/widget.min.js?ver=1.11.4 (62 of 83 - 75%) 200 - http://www.theosborn.org/wp-includes/js/jquery/ui/accordion.min.js?ver=1.11.4 (63 of 83 - 76%) 200 - http://www.theosborn.org/wp-includes/js/wp-embed.min.js?ver=c39570c078c67f50cfcafeebaf91152d (64 of 83 - 77%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.flexslider-min.js (65 of 83 - 78%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/jquery.magnific-popup.min.js (66 of 83 - 80%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/scripts/osborn.js (67 of 83 - 81%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/favicon.ico (68 of 83 - 82%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/styles/flexslider.css (69 of 83 - 83%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/styles/magnific-popup.css (70 of 83 - 84%) 200 - http://www.theosborn.org/wp-content/themes/the-osborn/style.css?v=2.0 (71 of 83 - 86%) 200 - http://www.theosborn.org/wp-content/plugins/gd-worker/public/css/gd-worker-public.css?ver=1.0.0 (72 of 83 - 87%) 200 - http://www.theosborn.org/wp-content/plugins/slide-in/css/wdsi.css?ver=1.2 (73 of 83 - 88%) 200 - http://www.theosborn.org/wp-content/plugins/ultimate-branding/ultimate-branding-files/modules/custom-admin-bar-files/css/general.css?ver=1.0 (74 of 83 - 89%) 200 - http://www.theosborn.org/wp-content/plugins/ultimate-branding/ultimate-branding-files/modules/favicons/css/admin.css?ver=1.0.0 (75 of 83 - 90%) 200 - http://www.theosborn.org/wp-includes/wlwmanifest.xml (76 of 83 - 92%) error - http://www.theosborn.org/sitemap/ (77 of 83 - 93%) error - http://www.theosborn.org/contact-us/privacy-policy/ (78 of 83 - 94%) 200 - http://www.theosborn.org/wp-content/uploads/2015/10/door-knob_O_32x32.png (79 of 83 - 95%) error - http://www.theosborn.org/wp-json/ (80 of 83 - 96%) error - http://www.theosborn.org/xmlrpc.php?rsd (81 of 83 - 98%) error - http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F (82 of 83 - 99%) error - http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F&format=xml (83 of 83 - 100%) Crawling Done...
ERROR Crawled 83 urls with 16 error(s) in 52.14 seconds average response time: 0.89 seconds average process time: 0.01 seconds
Start URL(s): http://www.theosborn.org
error (timeout): http://www.theosborn.org/photo-gallery/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/event/alexandra-zapruder-author-twenty-six-seconds-personal-history-zapruder-film/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/history/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/westchester-county-assisted-living-apartments/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F&format=xml from http://www.theosborn.org
error (timeout): http://www.theosborn.org/xmlrpc.php?rsd from http://www.theosborn.org
error (timeout): http://www.theosborn.org/contact-us/privacy-policy/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/ from http://www.theosborn.org from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/2016/10/19/matt-anderson-presents-wellspring-osborn/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/additional-services/ from http://www.theosborn.org from http://www.theosborn.org
error (timeout): http://www.theosborn.org/contact-us-download/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/sitemap/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/scholarship/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/wp-json/oembed/1.0/embed?url=http%3A%2F%2Fwww.theosborn.org%2F from http://www.theosborn.org
error (timeout): http://www.theosborn.org/accreditation/ from http://www.theosborn.org
error (timeout): http://www.theosborn.org/wp-json/ from http://www.theosborn.org`
So you are no longer crawling bad phone numbers (which was the expected behavior). Yay!
I'm getting these results (on python 2.7)
SUCCESS Crawled 81 urls in 14.91 seconds
average response time: 0.78 seconds
average process time: 0.34 seconds
You may want to increase the timeout with --timeout=20. The errors you see mean that pylinkvalidator does not get a response under 10 seconds.
Understood. Thanks for adding this feature!
Is there a way to enable the linkchecker to ignore telephone links? For a site with the following link:
The linkchecker attempts to crawl http://www.theosborn.org/tel:18006732926 which returns 404. The sites my company run have multiple telephone links. This site in particular has 6 telephone links in a sidebar that renders on every single page, which results in quite a few false positives: