OpenEnergyPlatform / oeplatform

Repository for the code of the Open Energy Platform (OEP) website. The OEP provides an interface to the Open Energy Family
http://openenergyplatform.org/
GNU Affero General Public License v3.0
62 stars 19 forks source link

Datapackage and frictionless-py #1018

Open areleu opened 2 years ago

areleu commented 2 years ago

Hello,

I wanted to point out that currently the datapackages one gets from the "Download Datapackage" are not easily parsed out of the box. I don't know if using something like frictionless to read them is intended but if that is the case I think some changes have to be made to the metadata in order for it to work out of the box.

I use the following file as an example: wind_turbine_domestic_lod_geoss_tp_oeo

  1. Make all the "name" entries in the resources section lower case, the json schema has this as a constraint and I could not make it work without changing the files.
  2. Remove foreign keys if it is not being used. I guess one could make it null but I think this is safer.
  3. Don't use empty lists in primarykey, I think the schema expects either a string or a null.
  4. This one is tricky: Make the "path" point to the csv file itself. I think one could have the website directly if the link pointed to a csv file but it does not. I say is tricky becasue I don't know how different applications handle relative paths.
  5. Change the format from SQL to csv, since the dapackage is metadata refering to the contents of the datapackage itself, since it is a csv file I think it should point to this format.
  6. On the same line with the last point, remove the dialect from the package, it raises a very cryptic error at least in frictionless (I am raising this issue there soon, it was super painful to debug but I don't think it is a fault of the datapackage but the library itself).
  7. Replace "bigint" with "int", the schema does not recognize bigint as a format.

Here is a gist with the modified package example: gist

If you load it from a python script using frictionless in a folder datapackage containing both files it will recognise.

I made a repository to reproduce the parsing with the modified metadata here: https://github.com/areleu/frictionless_oep_example

wingechr commented 2 years ago

Yea, good point. Currently, this was just a basic way to get data + metadata, but yes, eventually it should be fully compatible with frictionless.

I assign myself for now, but it might take a while, because there are other things with higher priority.