ckan / ckanext-dcat

CKAN ♥ DCAT
https://docs.ckan.org/projects/ckanext-dcat
167 stars 146 forks source link

Is ckanext-dcat intended to work with DataStore (scheming KeyError)? #305

Open eldobbins opened 1 month ago

eldobbins commented 1 month ago

We are developing a small catalog and have plans to include both DataStore uploads for table views and DCAT metadata fields. However, we can't seem to have both simultaneously.

I am running CKAN 2.10 installed with ckan-docker on a local machine (http://localhost:5000). My machine: Linux shell 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux but same problem on Windows.

The plugin set-up from ckan.ini is copied from the dcat docs:

ckan.plugins = envvars datastore datapusher customtheme dcat scheming_datasets
ckan.resource_proxy.timeout = 5
ckanext.geoview.ol_viewer.default_feature_hoveron = true
scheming.dataset_schemas = ckanext.dcat.schemas:dcat_ap_recommended.yaml
scheming.presets = ckanext.scheming:presets.json ckanext.dcat.schemas:presets.yaml
scheming.dataset_fallback = false
# Sites using the euro_dcat_ap and euro_dcat_ap_2 profiles must add the
# euro_dcat_ap_scheming profile if they want to use ckanext-scheming schemas (see next section)
ckanext.dcat.rdf.profiles = euro_dcat_ap_2 euro_dcat_ap_scheming

The error happens when I attempt to upload a resource (small CSV file) to DataStore using the button in the GUI catalog. It works fine if we upload the CSV using the API. The traceback of the docker container is:

aders in 72 bytes (1 switches on core 0)
acep-ckan-cont        | 2024-09-25 22:13:48,569 INFO  [ckan.config.middleware.flask_app]  302 /dataset/garbage-in-1/resource_data/2a619d3b-145e-4684-be38-26ebca41d280 render time 0.191 seconds
acep-solr-cont        | 2024-09-25 22:13:48.623 INFO  (qtp1663888181-25) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"+OR+id:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=0
acep-ckan-cont        | 2024-09-25 22:13:48,637 INFO  [ckan.config.middleware.flask_app]  200 /api/3/action/resource_show render time 0.064 seconds
acep-datapusher-cont  | 2024-09-25 22:13:48,643 INFO Fetching from: http://ckan:5000/dataset/920df5b0-c2d0-4451-8bc5-0a0987dd9311/resource/2a619d3b-145e-4684-be38-26ebca41d280/download/short_csv
acep-solr-cont        | 2024-09-25 22:13:48.677 INFO  (qtp1663888181-21) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"garbage-in-1"+OR+id:"garbage-in-1"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=1
acep-solr-cont        | 2024-09-25 22:13:48.699 INFO  (qtp1663888181-22) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"+OR+id:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=0
acep-datapusher-cont  | [pid: 8|app: 0|req: 81/81] 172.20.0.3 () {36 vars in 543 bytes} [Wed Sep 25 22:13:48 2024] GET /job/0f9db061-3446-4eff-9693-e855356a4b75 => generated 981 bytes in 6 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
acep-ckan-cont        | 2024-09-25 22:13:48,769 INFO  [ckan.config.middleware.flask_app]  200 /dataset/garbage-in-1/resource_data/2a619d3b-145e-4684-be38-26ebca41d280 render time 0.126 seconds
acep-solr-cont        | 2024-09-25 22:13:48.834 INFO  (qtp1663888181-21) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"+OR+id:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=0
acep-solr-cont        | 2024-09-25 22:13:48.849 INFO  (qtp1663888181-22) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"+OR+id:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=0
acep-ckan-cont        | 2024-09-25 22:13:48,870 INFO  [ckan.config.middleware.flask_app]  200 /dataset/920df5b0-c2d0-4451-8bc5-0a0987dd9311/resource/2a619d3b-145e-4684-be38-26ebca41d280/download/short_csv render time 0.073 seconds
acep-ckan-cont        | 2024-09-25 22:13:48,936 INFO  [ckan.config.middleware.flask_app]  200 /api/3/action/datastore_search render time 0.052 seconds
acep-ckan-cont        | 2024-09-25 22:13:48,957 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/jquery.js render time 0.017 seconds
acep-datapusher-cont  | 2024-09-25 22:13:48,945 INFO Deleting "2a619d3b-145e-4684-be38-26ebca41d280" from datastore.
acep-ckan-cont        | 2024-09-25 22:13:48,972 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/jed.js render time 0.012 seconds
acep-ckan-cont        | 2024-09-25 22:13:48,993 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/moment-with-locales.js render time 0.012 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,012 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/select2/select2.js render time 0.014 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,036 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/popperjs.js render time 0.020 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,061 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/purify.js render time 0.020 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,078 INFO  [ckan.config.middleware.flask_app]  304 /base/vendor/bootstrap/js/bootstrap.js render time 0.011 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,115 DEBUG [ckanext.datastore.logic.action] Setting datastore_active=False on resource 2a619d3b-145e-4684-be38-26ebca41d280
acep-ckan-cont        | /srv/app/src/ckan/ckanext/datastore/logic/action.py:641: SAWarning: TypeDecorator JsonDictType() will not produce a cache key because the ``cache_ok`` attribute is not set to True.  This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions.  Set this attribute to True if this type object's state is safe to use in a cache key, or False to disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)
acep-ckan-cont        |   model.Session.query(model.Resource).filter(
acep-solr-cont        | 2024-09-25 22:13:49.132 INFO  (qtp1663888181-21) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"+OR+id:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=0
acep-ckan-cont        | 2024-09-25 22:13:49,137 ERROR [ckan.views.api] 'extras'
acep-ckan-cont        | Traceback (most recent call last):
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/config/middleware/../../views/api.py", line 283, in action
acep-ckan-cont        |     result = function(context, request_data)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/logic/__init__.py", line 580, in wrapped
acep-ckan-cont        |     result = _action(context, data_dict, **kw)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckanext/datastore/logic/action.py", line 430, in datastore_delete
acep-ckan-cont        |     set_datastore_active_flag(context, data_dict, False)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckanext/datastore/logic/action.py", line 669, in set_datastore_active_flag
acep-ckan-cont        |     psi.index_package(_data_dict)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/lib/search/index.py", line 122, in index_package
acep-ckan-cont        |     validated_pkg_dict, _errors = lib_plugins.plugin_validate(
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/lib/plugins.py", line 327, in plugin_validate
acep-ckan-cont        |     result = plugin.validate(context, data_dict, schema, action)
acep-ckan-cont        |   File "/srv/app/src_extensions/ckanext-scheming/ckanext/scheming/plugins.py", line 283, in validate
acep-ckan-cont        |     for ex in data_dict['extras']:
acep-ckan-cont        | KeyError: 'extras'
acep-ckan-cont        | 2024-09-25 22:13:49,139 INFO  [ckan.config.middleware.flask_app]  500 /api/3/action/datastore_delete render time 0.058 seconds
acep-datapusher-cont  | 2024-09-25 22:13:49,145 INFO Determined headers and types: [{'id': 'col1', 'type': 'numeric'}, {'id': 'col2', 'type': 'numeric'}, {'id': 'col3', 'type': 'numeric'}]
acep-ckan-cont        | 2024-09-25 22:13:49,166 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/apply-html-class.js render time 0.020 seconds
acep-datapusher-cont  | 2024-09-25 22:13:49,157 INFO Saving chunk 0 (last)
acep-ckan-cont        | 2024-09-25 22:13:49,188 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/plugins/jquery.inherit.js render time 0.019 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,209 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/plugins/jquery.proxy-all.js render time 0.013 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,230 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/plugins/jquery.url-helpers.js render time 0.016 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,246 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/plugins/jquery.date-helpers.js render time 0.011 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,258 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/plugins/jquery.slug.js render time 0.008 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,268 INFO  [ckan.config.middleware.flask_app]  304 /base/javascript/plugins/jquery.slug-preview.js render time 0.007 seconds
acep-ckan-cont        | 2024-09-25 22:13:49,316 DEBUG [ckanext.datastore.logic.action] Setting datastore_active=True on resource 2a619d3b-145e-4684-be38-26ebca41d280
acep-solr-cont        | 2024-09-25 22:13:49.327 INFO  (qtp1663888181-25) [ x:ckan] o.a.s.c.S.Request webapp=/solr path=/select params={q=name:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"+OR+id:"920df5b0-c2d0-4451-8bc5-0a0987dd9311"&fq=site_id:"default"+%2Bentity_type:package&rows=1&wt=json} hits=1 status=0 QTime=0
acep-ckan-cont        | 2024-09-25 22:13:49,333 ERROR [ckan.views.api] 'extras'
acep-ckan-cont        | Traceback (most recent call last):
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/config/middleware/../../views/api.py", line 283, in action
acep-ckan-cont        |     result = function(context, request_data)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/logic/__init__.py", line 580, in wrapped
acep-ckan-cont        |     result = _action(context, data_dict, **kw)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckanext/datastore/logic/action.py", line 177, in datastore_create
acep-ckan-cont        |     set_datastore_active_flag(context, data_dict, True)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckanext/datastore/logic/action.py", line 669, in set_datastore_active_flag
acep-ckan-cont        |     psi.index_package(_data_dict)
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/lib/search/index.py", line 122, in index_package
acep-ckan-cont        |     validated_pkg_dict, _errors = lib_plugins.plugin_validate(
acep-ckan-cont        |   File "/srv/app/src/ckan/ckan/lib/plugins.py", line 327, in plugin_validate
acep-ckan-cont        |     result = plugin.validate(context, data_dict, schema, action)
acep-ckan-cont        |   File "/srv/app/src_extensions/ckanext-scheming/ckanext/scheming/plugins.py", line 283, in validate
acep-ckan-cont        |     for ex in data_dict['extras']:
acep-ckan-cont        | KeyError: 'extras'
acep-ckan-cont        | 2024-09-25 22:13:49,335 INFO  [ckan.config.middleware.flask_app]  500 /api/3/action/datastore_create render time 0.064 seconds
acep-datapusher-cont  | 2024-09-25 22:13:49,338 ERROR Job "push_to_datastore (trigger: date[2024-09-25 22:13:48 UTC], next run at: 2024-09-25 22:13:48 UTC)" raised an exception
acep-datapusher-cont  | Traceback (most recent call last):
acep-datapusher-cont  |   File "/usr/lib/python3.9/site-packages/apscheduler/executors/base.py", line 125, in run_job
acep-datapusher-cont  |     retval = job.func(*job.args, **job.kwargs)
acep-datapusher-cont  |   File "/usr/lib/python3.9/site-packages/datapusher-0.0.20-py3.9.egg/datapusher/jobs.py", line 527, in push_to_datastore
acep-datapusher-cont  |     send_resource_to_datastore(resource, headers_dicts, records,
acep-datapusher-cont  |   File "/usr/lib/python3.9/site-packages/datapusher-0.0.20-py3.9.egg/datapusher/jobs.py", line 259, in send_resource_to_datastore
acep-datapusher-cont  |     check_response(r, url, 'CKAN DataStore')
acep-datapusher-cont  |   File "/usr/lib/python3.9/site-packages/datapusher-0.0.20-py3.9.egg/datapusher/jobs.py", line 156, in check_response
acep-datapusher-cont  |     raise HTTPError(
acep-datapusher-cont  | datapusher.jobs.HTTPError: <unprintable HTTPError object>

Apologies for the interleaved messages from the various containers. Looks to me like scheming kaks because of a missing field extras which then causes datapusher to fail, which then triggers a 500 error within the catalog. Can I avoid this error with a different set of options?

Thanks, Liz

jematson commented 1 month ago

Looks like ckanext-scheming does some validation with the 'extras' field, but ckanext-dcat gets rid of that field. We were able to patch the problem by adding a try-except block around lines 283-289 in ckanext-scheming/ckanext/scheming/plugins.py, as follows:

if action_type == 'show':
  if composite_convert_fields:
      try:
          for ex in data_dict['extras']:
              if ex['key'] in composite_convert_fields:
                  data_dict[ex['key']] = json.loads(ex['value'])
          data_dict['extras'] = [
              ex for ex in data_dict['extras']
              if ex['key'] not in composite_convert_fields
          ]
      except:
          print('Dataset has no extras field.')

Probably not the most graceful fix, but if there's no 'extras' field then that chunk of code won't be doing anything anyways.

amercader commented 2 weeks ago

@eldobbins thanks for the detailed report

However, we can't seem to have both simultaneously.

Yes, absolutely. Ckanext-dcat should work perfectly fine alongside datastore and datapusher/xloader/etc. If it doesn't it's a bug

I managed to reproduce it on my end. Took me a while as I didn't realize I was running CKAN 2.11 as opposed to 2.10 as you reported, so that's a good clue. Let me investigate a bit more. The solution might be handling it in ckanext-scheming as @jematson suggests but I want to understand why this is happening first.

scheming kaks because of a missing field

Really love the sound of this, maybe I'll start using it!

amercader commented 1 week ago

I tracked down the issue to this patch https://github.com/ckan/ckan/pull/7571 not being backported to CKAN 2.10. This is now backported so the next 2.10 patch release (and the Docker images that will get updated with it) will include the patch and fix this issue.

In parallel I also sent a PR to ckanext-scheming with a fix similar to @jematson 's one for good measure. That should also be in the next scheming release.

In the meantime you can apply the patch manually.