energy-data / energydata.info

energydata.info - open data and analytics for a sustainable energy future
http://energydata.info
MIT License
26 stars 6 forks source link

DDH Harvested Dataset Missing #286

Closed shandaoz closed 5 years ago

shandaoz commented 5 years ago

The dataset “China - Energy Saving Management Action Plan for Liaoning Safe & Sustainable Urban Water Supply Project” was published in June 11, but has not yet been harvested to ENERGYDATA. It is a public dataset with the ENEGRYDATA tag.

On first inspection, I can see the harvester doesn’t like the colon that is in the yaml of (at least one) of the datasets. Specifically:

“Group: Solar Radiation Measurement Data”

Looks like the Tags/Keywords field?

Whilst that almost certainly isn’t the “missing” dataset below, it is causing the harvester to (silently) fall over. The solution would be to remove the colon from that field.

I suspect this has come from a desire to group datasets of a particular topic together more explicitly in the metadata? I will be looking at a solution for that on the CKAN side as part of the task estimates I sent you earlier today (wrt to the Pakistan Solar Measurement Data dataset).

jodiegardiner commented 5 years ago

Hi Shandao,

I've figured this one out eventually!

The cause is a syntax error that is impossible to amend on our side. Check the "field_external_metadata" of these two packages on DDH:

https://datacatalog.worldbank.org/api/3/action/package_show?id=28601562-61b6-43ef-8b46-be8c1593b1da

https://datacatalog.worldbank.org/api/3/action/package_show?id=a78df6c4-aef8-4321-86cf-f663bd3681b4

In the first one, the YAML is correctly formatted, with new lines (/n) before each property. In the second (the one which isn't being harvested) the YAML is all in one line. The parser reads that as one line and you can't have multiple colons in a YAML single-line string as colon is what is used to identify the name of the property.

Therefore, to fix this one, the field_external_metadata will need to be amended on the second package to use correct YAML syntax. Simply adding \n before each property after the first (as seen in the first package) will mean the harvester can parse it correctly.

shandaoz commented 5 years ago

Awesome! Thanks Jodie! I will communicate this with Tim and the team.

jodiegardiner commented 5 years ago

That dataset has been harvested now - it lives here:

https://energydata.info/dataset/na-energy-saving-management-action-plan-for-liaoning-safe-and-sustainable-urban-water-supply-project