Open rshk opened 10 years ago
Well that seems bad. File a PR with the merged JSON?
@konklone the problem now is figuring out 1. whether catalogs are still using ckan 2. where are the actual ckan instances located. This is not trivial to automate, and IMHO the best solution would be to ask original authors of entries in the file to keep them up to date properly.. (I figured out some, but in certain cases ckan is quite well-hidden, if there at all..)
For what it's worth, I had to do this recently, and so:
Note that:
I noticed that many URLs in the instances.json file do not point to an actual Ckan instance. This is usually due to the main website being made with a CMS, with Ckan installed in a sub-path / subdomain.
While this is fine for links, it causes problems when using the list to perform automated tasks (eg. if I want to harvest resources from all the listed catalogs) as there is no way to find the API endpoint root.
To find which URLs are not pointing to a ckan installation, I wrote this script:
https://github.com/rshk/ckan-guerrilla-gear/tree/master/scripts/random/instances-file-checker
Full result here: http://paste.pound-python.org/show/4JpbizxNavIo0NYzg5iy/
Websites not providing any API:
Catalogs API v2 only
Maybe we should merge those results with the instances.json file..?