ckan / ckanext-geoview

CKAN Geospatial ResourceView
MIT License
43 stars 60 forks source link

Plugin is crashing the harvester #26

Closed letmaik closed 9 years ago

letmaik commented 9 years ago

On import phase, the following gets thrown by geoview:

$ paster --plugin=ckanext-harvest harvester import --config /etc/ckan/default/development.ini
/usr/lib/ckan/default/src/ckan/ckan/new_authz.py:6: FutureWarning: ckan.new_authz has been renamed to ckan.authz. The ckan.new_authz module will be removed in a future release.
  FutureWarning)
2015-10-17 09:06:50,410 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2015-10-17 09:06:50,417 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2015-10-17 09:06:50,431 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2015-10-17 09:06:50,432 DEBUG [ckanext.harvest.model] Harvest tables already exist

2015-10-17 09:06:50,478 DEBUG [ckanext.harvest.model] Harvest tables already exist
DB tables created
2015-10-17 09:06:50,483 INFO  [ckanext.harvest.logic.action.update] Harvest objects import: {'source_id': None, 'package_id': False, 'harvest_object_id': False}
2015-10-17 09:06:50,494 DEBUG [ckanext.dcat.harvesters.rdf] In DCATRDFHarvester import_stage
/usr/lib/ckan/default/local/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py:79: SAWarning: Usage of the 'related attribute set' operation is not currently supported within the execution stage of the flush process. Results may not be consistent.  Consider using alternative event listeners or connection-level operations instead.
  sess._flush_warning("related attribute set")
2015-10-17 09:06:50,599 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-180.0, -90.0], [180.0, -90.0], [180.0, 90.0], [-180.0, 90.0], [-180.0, -90.0]]]}'
2015-10-17 09:06:50,610 DEBUG [ckanext.spatial.lib] Extent for package 1e85a8ef-efcc-4fa1-a40a-3cc1bec5c8bc unchanged
Traceback (most recent call last):
  File "/usr/lib/ckan/default/bin/paster", line 9, in <module>
    load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')()
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 152, in command
    self.import_stage()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 325, in import_stage
    'package_id': self.options.package_id,
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 429, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/logic/action/update.py", line 273, in harvest_objects_import
    harvester.import_stage(obj)
  File "/usr/lib/ckan/default/src/ckanext-dcat/ckanext/dcat/harvesters/rdf.py", line 284, in import_stage
    p.toolkit.get_action('package_update')(context, dataset)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 429, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/update.py", line 371, in package_update
    {'package': data})
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 429, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 468, in package_create_default_resource_views
    create_datastore_views=create_datastore_views)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/datapreview.py", line 303, in add_views_to_dataset_resources
    create_datastore_views)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/datapreview.py", line 261, in add_views_to_resource
    'package': dataset_dict
  File "/usr/lib/ckan/default/src/ckanext-geoview/ckanext/geoview/plugin.py", line 279, in can_view
    format_lower = resource['format'].lower()
KeyError: 'format'

And I have no clue how to recover from that, since none of the clear or purge operations seem to work since the DB entries are now in an inconsistent state somehow. When I try to clear jobs I get:

An error occurred: [(IntegrityError) update or delete on table "package" violates foreign key constraint "harvest_object_package_id_fkey" on table "harvest_object" DETAIL: Key (id)=(1e85a8ef-efcc-4fa1-a40a-3cc1bec5c8bc) is still referenced from table "harvest_object". "begin; \n update package set state = 'to_delete' where id in (select package_id from harvest_object where harvest_source_id = 'fd15d7cd-b576-4577-acbb-7246eee9b41e');\n delete from resource_view where resource_id in (select id from resource where package_id in (select id from package where state = 'to_delete' ));\n delete from resource_revision where package_id in (select id from package where state = 'to_delete' );\n delete from resource where package_id in (select id from package where state = 'to_delete' );\n \n delete from harvest_object_error where harvest_object_id in (select id from harvest_object where harvest_source_id = 'fd15d7cd-b576-4577-acbb-7246eee9b41e');\n delete from harvest_object_extra where harvest_object_id in (select id from harvest_object where harvest_source_id = 'fd15d7cd-b576-4577-acbb-7246eee9b41e');\n delete from harvest_object where harvest_source_id = 'fd15d7cd-b576-4577-acbb-7246eee9b41e';\n delete from harvest_gather_error where harvest_job_id in (select id from harvest_job where source_id = 'fd15d7cd-b576-4577-acbb-7246eee9b41e');\n delete from harvest_job where source_id = 'fd15d7cd-b576-4577-acbb-7246eee9b41e';\n delete from package_role where package_id in (select id from package where state = 'to_delete' );\n delete from user_object_role where id not in (select user_object_role_id from package_role) and context = 'Package';\n delete from package_tag_revision where package_id in (select id from package where state = 'to_delete');\n delete from member_revision where table_id in (select id from package where state = 'to_delete');\n delete from package_extra_revision where package_id in (select id from package where state = 'to_delete');\n delete from package_revision where id in (select id from package where state = 'to_delete');\n delete from package_tag where package_id in (select id from package where state = 'to_delete');\n delete from package_extra where package_id in (select id from package where state = 'to_delete');\n delete from package_relationship_revision where subject_package_id in (select id from package where state = 'to_delete');\n delete from package_relationship_revision where object_package_id in (select id from package where state = 'to_delete');\n delete from package_relationship where subject_package_id in (select id from package where state = 'to_delete');\n delete from package_relationship where object_package_id in (select id from package where state = 'to_delete');\n delete from member where table_id in (select id from package where state = 'to_delete');\n delete from related_dataset where dataset_id in (select id from package where state = 'to_delete');\n delete from related where id in ('');\n delete from package where id in (select id from package where state = 'to_delete');\n commit;\n " {}]

amercader commented 9 years ago

The original crash was fixed on 5cdd81c. This might not have been published on a new version on PyPi, I'll do that later today. In the meantime you can run the latest source.

As for the harvester issues let's discuss them on its repo.

letmaik commented 9 years ago

Please look at my pull request. Your fix is only one half of it. I wrote the other half ;)