geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
210 stars 155 forks source link

add support for CKAN integration #95

Open tomkralidis opened 12 years ago

tomkralidis commented 12 years ago

Similar to GeoNode and Open Data Catalog, CKAN carries its own data model.

Create similar binding to support CKAN CSW server support w/ pycsw.

(FYI closing #73 given the possible duplication messages due to the svn->git changeover)

kalxas commented 11 years ago

Since this involves tight integration with CKAN, and right now we are supporting CKAN through DB store, I propose to slip this issue to next release. There is no time for 1.6.0. Thoughts?

kalxas commented 10 years ago

Current status: CKAN uses mainly 2 tables to store records: package and package_extra. The first is a normal relational table: https://github.com/ckan/ckan/blob/master/ckan/model/package.py#L32 The second one is a key-value pair: https://github.com/ckan/ckan/blob/master/ckan/model/package_extra.py#L17

In order to be able to integrate pycsw to CKAN there are 3 options:

  1. Create a pycsw plugin to read-write to those 2 tables.
  2. Create a pycsw table in ckan database and hook logical update, delete and insert CKAN actions to sync the pycsw table (and vise versa in case of pycsw harvesting...)
  3. Create a new pycsw CKAN backend which will use the CKAN API to interact with the database instead of SQL (this means abstract fes.py and repository.py and make pycsw a CKAN client).

I am starting a prototype for option 2 as an intermediate solution and will report back.

tomkralidis commented 10 years ago

cc @smrazgs @rclark

rclark commented 10 years ago

We pursued something like your second option in our NGDS project. You can have a look here: https://github.com/ngds/ckanext-ngds/tree/master/ckanext/ngds/csw

A lot of the mapping from a CKAN package to the pycsw table is done here: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/logic/pycsw.py#L31

We needed ISO support so creating the full-text of the XML doc was also tricky: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/templates/package_to_iso.xml

kalxas commented 10 years ago

Thanks @rclark Did you use the action API to catch the package updates or another method? I see that there is a commented block here: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/plugin.py#L100

rclark commented 10 years ago

That was the plan, but it was not thoroughly tested. @asonnenschein may know more about recent progress.

kalxas commented 10 years ago

I would like to pursue option no 3 in the future. This will need a backend refactoring and will open the road for NoSQL backends for pycsw. This should probably happen for pycsw 2.x, since I expect breakage to happen :)

kalxas commented 10 years ago

@rclark @asonnenschein this is now implemented as a hook to ckan action API: https://github.com/kalxas/ckanext-publicamundi/blob/geodatacamp/ckanext/publicamundi/plugins.py#L283

kalxas commented 9 years ago

Full CKAN integration is now complete: https://github.com/PublicaMundi/ckanext-publicamundi/issues/70

tomkralidis commented 9 years ago

cc @amercader

@kalxas great work here! Does this have any implications against master or is this all downstream in CKAN plugins?

Can you outline the approach? If this work is integrated into ckanext-spatial, then would this eliminate the need for doing the CKAN<->pycsw sync in favour of binding direct?

kalxas commented 9 years ago

No implication against pycsw master, all done downstream, within a CKAN plugin called publicamundi_package.

The approach is this:

  1. ckanext-publicamundi has a plugin to define metadata schemas through zope-interface and zope-schema https://github.com/PublicaMundi/ckanext-publicamundi/tree/master/ckanext/publicamundi/lib/metadata
  2. This plugin helps developer define things like ISO-19115, DC etc and creates the schema in package extras of CKAN packages. At the same time it automatically generates a metadata editor UI in CKAN dataset.
  3. publicamundi_package plugin defines the csw_record table: https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/model/csw_record.py so it gets created when plugin is initiated (no need to install pycsw externally, it is in the pip requirements)
  4. Once the user adds a new dataset or updates a current one, we have defined actions to catch that: https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/plugins.py#L643 https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/plugins.py#L661
  5. The pycsw synchronization happens every time a change happens in CKAN dataset: https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/lib/pycsw_sync.py

This work is part of geodata.gov.gr and is now in beta: http://labs.geodata.gov.gr/ If you want to try this out, there is an ansible script to install the demo in Debian 7: https://github.com/PublicaMundi/labs.geodata.gov.gr/tree/master/deployment/common-debian

I also have a dev setup here: http://83.212.104.89 http://83.212.104.89/csw?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=brief Please send me an e-mail to create a login for you ;)

kalxas commented 9 years ago

@amercader we are also planning to separate the schema plugin we created into a separate extension because we feel it is very useful as standalone. Any advice?

amercader commented 9 years ago

@kalxas This is looking great! I had a quick look and I'm impressed by the amount of work you guys have done, well done.

Separating the schema plugin sounds useful, I wonder if it has some overlap with @wardi's https://github.com/open-data/ckanext-scheming.

The general approach looks fine, once we get the next CKAN release out of the way and I have more time it'd be great to catch up and know more in detail what has been done and what your plans are.

In the meantime if you want to come by to one of the weekly CKAN dev meetings to present this feel free to drop by.

Again, great work!

kalxas commented 9 years ago

@amercader I would be happy to attend a dev meeting. I have seen the work from @wardi recently (from the dev mailing list). I have not seen all the details yet but it would be interesting if we could merge into one big schema extension...

kalxas commented 9 years ago

also @amercader @tomkralidis thanks for your nice words :)

wardi commented 9 years ago

@kalxas I'd love to talk about how you're extending the dataset metadata. Dev meeting might be good. IIUC your plugin supports arbitrary nested data as well as a flat version of the same for form updates, is that right?

drmalex07 commented 9 years ago

@wardi, @amercader, @kalxas,

Yes, indeed, we support arbitrary schemata expressed as zope.schema interfaces. Some core functionality (like flattening/unflattening) is shared across all metadata objects. We have chosen to use zope.schema as a declarative means, mainly for the following reasons:

Note that, there is much to be done until we consider this as a ready-to-distribute extension. Of course, we are willing to join the conversation and exchange ideas at CKAN's dev meetings!

kalxas commented 9 years ago

@wardi cool, is tomorrow's dev meeting at 16 UTC ok for you to discuss this? me and @drmalex07 can make it.

wardi commented 9 years ago

@kalxas @drmalex07 yes, I'll be there.

wardi commented 9 years ago

@kalxas @tomkralidis for multilingual metadata and labels I strongly suggest an approach like https://github.com/open-data/ckanext-fluent data fields or https://github.com/open-data/ckanext-scheming/#label where you accept and produce dicts of BCP-47 language keys with string values

kalxas commented 9 years ago

@wardi thanks! We are looking into this right now :)

frafra commented 1 year ago

There has not been a working PyCSW support in CKAN for many years, and PyCSW documentation should be updated accordingly: https://github.com/ckan/ckanext-spatial/issues/297.

I added a PyCSW endpoint to CKAN by creating a tool that harvest from CKAN API and add the datasets to PyCSW: https://github.com/COATnor/coat2pycsw.