ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Fix a problem with data-dictization of lists to fix search-index rebuild #529

Closed seitenbau-govdata closed 1 year ago

seitenbau-govdata commented 1 year ago

Fixes #528.

The update of sqlalchemy from 1.3 to 1.4 caused a problem with the dictization of the data returned by the db.

https://github.com/ckan/ckanext-harvest/blob/master/ckanext/harvest/logic/dictization.py#L79 returns a list of db entries containing the errors of the harvest job. With the update of sqlalchemy the the return type of the list elements has changed from sqlalchemy.util._collections.result to sqlalchemy.engine.row.Row. The new type cannot be reliably dictized the way we did it so far. Therefore we made sure the elements are dictized.

seitenbau-govdata commented 1 year ago

Could somebody review if this is a viable solution? Maybe @amercader or @pdelboca ? 😃

pdelboca commented 1 year ago

Hello @seitenbau-govdata ! Thanks for this contribution!

A couple of thoughts/questions:

seitenbau-govdata commented 1 year ago

Hi @pdelboca. Thank you for the feedback.

We created a testcase for the API-method harvest_source_show_status. If the dictization of the response works as intended this test will succeed, otherwise it will break at line 778. We use json.dumps to check this like in the CKAN code where the reindex error is currently raised: https://github.com/ckan/ckan/blob/8aaf1c65b1a83d53296b71e16cfff4464694aad1/ckan/lib/search/index.py#L130. What causes the json.dumps to fail is that the elements of the lists last_job['gather_error_summary'] and last_job['object_error_summary'] are now of the type sqlalchemy.engine.row.Row. For some reason this type cannot be serialized by json.dumps. That`s why our suggested solution contains a separate dictization of the single elements of those lists first.

seitenbau-govdata commented 1 year ago

@pdelboca or @amercader Could you take a quick look at it? I think the problem is quite annoying for all users with CKAN 2.10. Thanks in advance. 😃

amercader commented 1 year ago

Thanks @seitenbau-govdata , I've released a new version with this fix