Open MikePulsiferDOL opened 9 years ago
@MikePulsiferDOL There's a current report available at http://data-staging.civicagency.org/archive/error_log/49028.csv
Note that the broken link report checking uses HTTP HEAD requests and we've seen some instances where we get 500 or 400 responses from from HEAD that we wouldn't get from GET, but it is a requirement in the HTTP spec for servers to properly implement the HEAD method:
The methods GET and HEAD MUST be supported by all general-purpose servers.
The broken link checker does fall back to HTTP GET when HTTP HEAD fails, but it has a timeout of a few seconds. I'll see if I can spend some more time debugging this, but I've definitely confirmed I'm getting failures on my personal computer when I do HTTP HEAD requests on the ones listed in the error log.
I see your crawler appears to be expecting text/html. That is incorrect.
I'm seeing a relationship with this issue: https://github.com/GSA/enterprise-data-inventory/issues/177
@MikePulsiferDOL Yes, an accessURL
(as is used for referencing an API) is assumed to be html since it's meant to be documentation. This is stated in the spec:
It is usually assumed that accessURL is an HTML webpage.
The mediaType
field is associated with the downloadURL
not the accessURL
There's no need to specify the mediaType for an API since the accessURL
is meant to point to documentation as per the API guidance
The format is simply "API" and the mediaType
can be left blank, so it looks like inventory.data.gov is behaving correctly. If you're linking directly to a download of a particular file format, you should use downloadURL
and then specify the mediaType
.
If you want to provide more complex machine readable documentation for the API including different parameters and formats supported, you can use the describedBy
field which is labeled as "Data Dictionary" on inventory.data.gov and use something like Swagger to generate that file. This is described in the "Machine Readable API Documentation" section of the API guidance
Leaving this open until Quality Checks are added to the Project Open Data Dashboard.
Should this be closed? Is #154 the same question?
This can be closed.
The dashboard suggests that 35% of DOL's links in its PDL (last updated ahead of the previous IDC) are broken, but based on the results of an app I built to test those links, we have 0 broken links.
Can you please either provide the broken link report or check again and update the dashboard.