geosolutions-it / ckanext-dcatapit

CKAN extension for the Italian Open Data Portals (DCAT_AP-IT)
GNU Affero General Public License v3.0
10 stars 18 forks source link

New Harvesting Tests #155

Closed tdipisa closed 6 years ago

tdipisa commented 6 years ago

New Harvest sources must be tested:

For current status and errors details refer to: https://docs.google.com/spreadsheets/d/1RhTOpO1VJTDvn8LdEYCnMvP6Fim21mNANR9D-f7Dkzs/edit#gid=0

giorgialodi commented 6 years ago

Could you please verify the harvesting of Comune di Palermo? It seems they fixed the issue. We need to verify that. If the issue is still present, please let me know that I will contact and work with Palermo to resolve it. Thanks a lot!

tdipisa commented 6 years ago

@giorgialodi description updated

giorgialodi commented 6 years ago

Guys, once we have done the administrations with high priority we need to proceed with those that are in the task force DatiPubblici since we need to import their datasets in the DAF through the national catalogue. As soon as these tests are ok, I will indicate you those next

etj commented 6 years ago

Comune di Milano: test ok -- deve ancora essere inserito nel docker

etj commented 6 years ago

Università di Bologna: errori nel parsing. vedi #163

etj commented 6 years ago

MIUR: test ok -- deve ancora essere inserito nel docker

etj commented 6 years ago

Comune di Palermo: test ok -- need fix for #164 -- da inserire nel docker

giorgialodi commented 6 years ago

@etj as for MIUR; is it USTAT, right? The other URL is not working, correct?

giorgialodi commented 6 years ago

if the other URL from MIUR is not working, try with the following one http://dati.istruzione.it/opendata/CatalogoRDF (the organization is still MIUR)

giorgialodi commented 6 years ago

@etj do you have errors in "Comune di Milano" harvesting? I see there are 277 datasets; however they have 293 datasets in their catalogue (https://dati.comune.milano.it/dataset)

etj commented 6 years ago

@giorgialodi I found these errors for Milano:

giorgialodi commented 6 years ago

@etj could you please point out the datasets for which these errors are raised? In this way we can communicate that to Milano. May I extract by myself this information without bothering you further? Should I register to your testing instance?

giorgialodi commented 6 years ago

@etj we are analysing the erros "URL already in use" for Comune di Milano. It seems that the error occurs in the presence of datasets with same titles. However, there should be a mechanism that adds additional characters to the URL when this happens. How come that it does not work? The catalogue of Comune di Milano manages that case.

giorgialodi commented 6 years ago

@etj @tdipisa I was looking at the sources. There is an error for MIUR catalogue. Any hints? Their file seems fine to me, at least from a DCAT-AP_IT perspective.

etj commented 6 years ago

@giorgialodi the mime type returned by the MIUR service is wrong (text/plain), so the parser will not recognize it.

etj commented 6 years ago

Università di Bologna test ok #163 fixed.

etj commented 6 years ago

Comune di Palermo: test ok -- #164 fixed

etj commented 6 years ago

New organizations and sources have been added to the docker image https://github.com/geosolutions-it/dati-ckan-docker/issues/16

giorgialodi commented 6 years ago

@etj thanks a lot. I will inform MIUR about that. There is still the doubt (a few comments above) about "URL already in use" errors for Comune di Milano (which is also an error that occurred for Regione Toscana, doesn't it?). Could you please help me in understanding better how to cope with that type of error. We are currently in contact with Comune di Milano that told us they've chosen to use same titles for datasets that are similar.

tdipisa commented 6 years ago

@giorgialodi, a new issue (the #168) has been created to keep track of this and provide an improvement if you want. In Ckan there can not be two sets of data with the same Id (and the Id is generated by the name of the harvested dataset). If you agree I will add this improvement to the task list and provide an estimate.