ckan / ckan-instances

Repo for CKAN instances page
http://ckan.org/instances
12 stars 18 forks source link

Some URLs not pointing at Ckan instances #60

Open rshk opened 10 years ago

rshk commented 10 years ago

I noticed that many URLs in the instances.json file do not point to an actual Ckan instance. This is usually due to the main website being made with a CMS, with Ckan installed in a sub-path / subdomain.

While this is fine for links, it causes problems when using the list to perform automated tasks (eg. if I want to harvest resources from all the listed catalogs) as there is no way to find the API endpoint root.

To find which URLs are not pointing to a ckan installation, I wrote this script:

https://github.com/rshk/ckan-guerrilla-gear/tree/master/scripts/random/instances-file-checker

Full result here: http://paste.pound-python.org/show/4JpbizxNavIo0NYzg5iy/

Websites not providing any API:

"http://data.codeforhouston.com/"
"http://data.denvergov.org/"
"http://data.gc.ca"
"http://data.gov"
"http://data.graz.gv.at/"
"http://data.gv.at"
"http://data.linz.gv.at/"
"http://data.norge.no"
"http://data.nsw.gov.au"
"http://data.overheid.nl"
"http://daten.berlin.de"
"http://daten.hamburg.de/"
"http://dati.gov.it/"
"http://datospublicos.gob.ar/"
"http://govdata.de"
"http://healthdata.gov/"
"http://opencolorado.org/"
"http://opendatacanarias.es/"
"http://open-data.europa.eu/"
"http://open-data.okfn.gr/"
"http://www.hri.fi"

Catalogs API v2 only

"http://catalogue.datalocale.fr"
"http://dadosabertos.senado.gov.br/"
"http://dados.gov.br"
"http://data.buenosaires.gob.ar/"
"http://datacatalogs.org/"
"http://data.cityofsantacruz.com/"
"http://data.gov.sk"
"http://data.lexingtonky.gov/"
"http://datameti.go.jp/data/"
"http://data.ottawa.ca/"
"http://dati.toscana.it/"
"http://opendata.aragon.es/"
"http://opendata.comune.bari.it/"
"http://www.daten.rlp.de/"
"http://www.nosdonnees.fr/"
"http://www.opendata-hro.de/"
"http://www.opendatahub.it/"
"http://www.opendata.provincia.roma.it/"

Maybe we should merge those results with the instances.json file..?

konklone commented 10 years ago

Well that seems bad. File a PR with the merged JSON?

rshk commented 10 years ago

@konklone the problem now is figuring out 1. whether catalogs are still using ckan 2. where are the actual ckan instances located. This is not trivial to automate, and IMHO the best solution would be to ask original authors of entries in the file to keep them up to date properly.. (I figured out some, but in certain cases ckan is quite well-hidden, if there at all..)

jpmckinney commented 10 years ago

For what it's worth, I had to do this recently, and so:

Note that: