Open tomkralidis opened 12 years ago
Since this involves tight integration with CKAN, and right now we are supporting CKAN through DB store, I propose to slip this issue to next release. There is no time for 1.6.0. Thoughts?
Current status: CKAN uses mainly 2 tables to store records: package and package_extra. The first is a normal relational table: https://github.com/ckan/ckan/blob/master/ckan/model/package.py#L32 The second one is a key-value pair: https://github.com/ckan/ckan/blob/master/ckan/model/package_extra.py#L17
In order to be able to integrate pycsw to CKAN there are 3 options:
I am starting a prototype for option 2 as an intermediate solution and will report back.
cc @smrazgs @rclark
We pursued something like your second option in our NGDS project. You can have a look here: https://github.com/ngds/ckanext-ngds/tree/master/ckanext/ngds/csw
A lot of the mapping from a CKAN package to the pycsw table is done here: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/logic/pycsw.py#L31
We needed ISO support so creating the full-text of the XML doc was also tricky: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/templates/package_to_iso.xml
Thanks @rclark Did you use the action API to catch the package updates or another method? I see that there is a commented block here: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/plugin.py#L100
That was the plan, but it was not thoroughly tested. @asonnenschein may know more about recent progress.
I would like to pursue option no 3 in the future. This will need a backend refactoring and will open the road for NoSQL backends for pycsw. This should probably happen for pycsw 2.x, since I expect breakage to happen :)
@rclark @asonnenschein this is now implemented as a hook to ckan action API: https://github.com/kalxas/ckanext-publicamundi/blob/geodatacamp/ckanext/publicamundi/plugins.py#L283
Full CKAN integration is now complete: https://github.com/PublicaMundi/ckanext-publicamundi/issues/70
cc @amercader
@kalxas great work here! Does this have any implications against master or is this all downstream in CKAN plugins?
Can you outline the approach? If this work is integrated into ckanext-spatial, then would this eliminate the need for doing the CKAN<->pycsw sync in favour of binding direct?
No implication against pycsw master, all done downstream, within a CKAN plugin called publicamundi_package.
The approach is this:
This work is part of geodata.gov.gr and is now in beta: http://labs.geodata.gov.gr/ If you want to try this out, there is an ansible script to install the demo in Debian 7: https://github.com/PublicaMundi/labs.geodata.gov.gr/tree/master/deployment/common-debian
I also have a dev setup here: http://83.212.104.89 http://83.212.104.89/csw?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=brief Please send me an e-mail to create a login for you ;)
@amercader we are also planning to separate the schema plugin we created into a separate extension because we feel it is very useful as standalone. Any advice?
@kalxas This is looking great! I had a quick look and I'm impressed by the amount of work you guys have done, well done.
Separating the schema plugin sounds useful, I wonder if it has some overlap with @wardi's https://github.com/open-data/ckanext-scheming.
The general approach looks fine, once we get the next CKAN release out of the way and I have more time it'd be great to catch up and know more in detail what has been done and what your plans are.
In the meantime if you want to come by to one of the weekly CKAN dev meetings to present this feel free to drop by.
Again, great work!
@amercader I would be happy to attend a dev meeting. I have seen the work from @wardi recently (from the dev mailing list). I have not seen all the details yet but it would be interesting if we could merge into one big schema extension...
also @amercader @tomkralidis thanks for your nice words :)
@kalxas I'd love to talk about how you're extending the dataset metadata. Dev meeting might be good. IIUC your plugin supports arbitrary nested data as well as a flat version of the same for form updates, is that right?
@wardi, @amercader, @kalxas,
Yes, indeed, we support arbitrary schemata expressed as zope.schema interfaces. Some core functionality (like flattening/unflattening) is shared across all metadata objects. We have chosen to use zope.schema as a declarative means, mainly for the following reasons:
Note that, there is much to be done until we consider this as a ready-to-distribute extension. Of course, we are willing to join the conversation and exchange ideas at CKAN's dev meetings!
@wardi cool, is tomorrow's dev meeting at 16 UTC ok for you to discuss this? me and @drmalex07 can make it.
@kalxas @drmalex07 yes, I'll be there.
@kalxas @tomkralidis for multilingual metadata and labels I strongly suggest an approach like https://github.com/open-data/ckanext-fluent data fields or https://github.com/open-data/ckanext-scheming/#label where you accept and produce dicts of BCP-47 language keys with string values
@wardi thanks! We are looking into this right now :)
There has not been a working PyCSW support in CKAN for many years, and PyCSW documentation should be updated accordingly: https://github.com/ckan/ckanext-spatial/issues/297.
I added a PyCSW endpoint to CKAN by creating a tool that harvest from CKAN API and add the datasets to PyCSW: https://github.com/COATnor/coat2pycsw.
Similar to GeoNode and Open Data Catalog, CKAN carries its own data model.
Create similar binding to support CKAN CSW server support w/ pycsw.
(FYI closing #73 given the possible duplication messages due to the svn->git changeover)