Open gemfarmer opened 8 years ago
After reviewing this with @waldoj, it looks like this is not related to subdomains (that was a coincidence), but likely related to how pa11y-crawl
opts to use a site map if it is available. This isn't a problem when the project is being run over localhost
This is the likely offending line. It is possible that the $TEMP_DIR
is saving the sitemap urls in a strange manner
cc @stvnrlly
Hi, I'm new to pa11y accessability testing. i'm trying to use pa11y-crawl [URL] to find all HTML pages and runs pa11y on each one.but i'm getting the below error am i missing out anything. Any advise would be helpful. Thanks in advance.
C:\Windows\system32>pa11y-crawl nature.com 'bash' is not recognized as an internal or external command, operable program or batch file.
@syndy1989 Hi there!
As an initial matter, you should know that pa11y-crawl
is both experimental and unsupported, which makes it pretty fragile. You may have better success with one of the more official pa11y
options, such as the "webservice".
Regarding the error that you're seeing: it looks like you're running on Windows, while this currently works on macOS. I'm not that familiar with the Windows command line, but I don't believe it supports bash
natively. If you're on Windows 10, there's now a way to create a Ubuntu Linux environment and use bash
. That may allow you to use this tool (though, because it's unsupported, you may still have issues).
@stvnrlly Hi there, I'm actually using Windows server 2012. I tried downloading cygwin on Windows to run bash commands. I've noticed that pa11y-crawl gives the following error when attempting to crawl a URL with a subdomain.
. is not an html document, skipping
Any advice on this would be helpful. Thanks in advance
I'm afraid that I won't be able to help troubleshoot that issue. If we're able to spend time working on this project in the future, we may be able to fix the problem that caused this issue to be opened in the first place, which may help with what you're seeing.
I've noticed that
pa11y-crawl
gives the following error when attempting to crawl a URL with a subdomain.For example,
https://login.gov
crawls successfully, buthttps://useiti.doi.gov
orhttps://18f.gsa.gov
cannot find valid html to scan.If the same projects are crawled on localhost, it crawls properly.
This is a problem on federalist URLs, because we end up seeing the following: