ckan / ckanext-archiver

Archive CKAN resources
MIT License
21 stars 46 forks source link

archiver doesn't cache resources #67

Open abdelrahman146 opened 5 years ago

abdelrahman146 commented 5 years ago

I'm running ckan 2.8 on apache2, I have installed the archiver following the instructions and configured job workers to run the queues in the background.

paster --plugin=ckanext-archiver celeryd2 run priority -c production.ini
paster --plugin=ckanext-archiver celeryd2 run bulk -c production.ini

when I run paster --plugin=ckanext-archiver archiver update --queue=priority -c <path to CKAN config> it shows me this result:

2019-06-24 06:55:13,826 INFO  [ckanext.archiver.commands] Queuing dataset zonas-para-perros (1 resources)
2019-06-24 06:55:13,828 INFO  [ckan.lib.jobs] Added background job 25715cff-21c7-4a54-a32d-7fd5b24aae70 to queue "bulk"
2019-06-24 06:55:13,828 DEBUG [ckanext.archiver.lib] Archival of package put into celery queue bulk: zonas-para-perros
2019-06-24 06:55:13,932 INFO  [ckanext.archiver.commands] Queuing dataset zonas-verdes (1 resources)
2019-06-24 06:55:13,934 INFO  [ckan.lib.jobs] Added background job 846fcaf5-8754-4b03-9df8-48cfdff8295a to queue "bulk"
2019-06-24 06:55:13,934 DEBUG [ckanext.archiver.lib] Archival of package put into celery queue bulk: zonas-verdes
2019-06-24 06:55:14,046 INFO  [ckanext.archiver.commands] Completed queueing

but when I see view archival information by running: paster --plugin=ckanext-archiver archiver view --config=production.ini it shows that 0 resources are archived.

2019-06-24 06:39:13,511 INFO  [ckanext.geonetwork.harvesters.geonetwork] GeoNetwork harvester: extending ISODocument with TimeInstant
2019-06-24 06:39:13,511 INFO  [ckanext.geonetwork.harvesters.geonetwork] GeoNetwork harvester: adding old GML URI
2019-06-24 06:39:13,511 INFO  [ckanext.geonetwork.harvesters.geonetwork] Added old URI for gml to temporal-extent-begin
2019-06-24 06:39:13,512 INFO  [ckanext.geonetwork.harvesters.geonetwork] Added old URI for gml to temporal-extent-begin
2019-06-24 06:39:13,512 INFO  [ckanext.geonetwork.harvesters.geonetwork] Added old URI for gml to temporal-extent-end
2019-06-24 06:39:13,512 INFO  [ckanext.geonetwork.harvesters.geonetwork] Added old URI for gml to temporal-extent-end
2019-06-24 06:39:13,512 INFO  [ckanext.geonetwork.harvesters.geonetwork] Added old URI for gml to temporal-extent-instant
2019-06-24 06:39:13,936 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2019-06-24 06:39:13,951 DEBUG [ckanext.harvest.model] Harvest tables already exist
2019-06-24 06:39:13,978 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2019-06-24 06:39:13,999 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2019-06-24 06:39:14,005 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2019-06-24 06:39:14,433 DEBUG [ckanext.harvest.model] Harvest tables already exist
2019-06-24 06:39:14,460 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2019-06-24 06:39:14,465 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
Resources: 2029 total
Archived resources: 0 total
                    0 with cache_url
Latest archival: (no)

I need to archive the resources and make information appear on the resource page.

Zharktas commented 5 years ago

CKAN 2.8 does not have celery anymore, you should use job workers instead https://docs.ckan.org/en/2.8/maintaining/background-tasks.html#running-background-jobs

abdelrahman146 commented 5 years ago

Thanks, @Zharktas it worked! but how can I show the archive information in the resource page? just like the picture in the readme

https://github.com/ckan/ckanext-archiver/blob/master/archiver_resource.png

marcoingrosso commented 4 years ago

Dear all, It's not clear to me how to configure archiver job worker in CKAN 2.8 ... Colud you help me please? Thanks in advance!

davidread commented 4 years ago

@marcoingrosso First you need to set up Redis backend, for storing the Celery queue items - detailed here: https://github.com/ckan/ckanext-archiver/blob/e72dcccc1c02d36bf136792e68784458e9ab1e8d/README.rst#redis-backend

Second you need to run the Celery queue. This is described here: https://github.com/ckan/ckanext-archiver/blob/e72dcccc1c02d36bf136792e68784458e9ab1e8d/README.rst#using-archiver

marcoingrosso commented 4 years ago

Dear David, I have Redis 3.0.6 (00000000/0) 64 bit running on port 6379.

I have two queues launched with following commands (in 2 different shells):

            `
    su -l www-data -s /bin/bash -c '. /usr/lib/ckan/default/bin/activate ; cd /usr/lib/ckan/default/src/ ; paster --plugin=ckanext-archiver celeryd2 run priority -c /etc/ckan/default/production.ini'

    su -l www-data -s /bin/bash -c '. /usr/lib/ckan/default/bin/activate ; cd /usr/lib/ckan/default/src/ ; paster --plugin=ckanext-archiver celeryd2 run bulk -c /etc/ckan/default/production.ini'
            `

Trying to archive with following command:

paster --plugin=ckanext-archiver archiver update --queue=priority -c /etc/ckan/default/production.ini

Still no resources archived.

If I try a test:

nosetests --ckan ./ckanext-archiver/tests/ --with-pylons=./ckanext-archiver/test-core.ini

the result is an error:

Traceback (most recent call last): File "/usr/lib/ckan/default/bin/nosetests", line 10, in <module> sys.exit(run_exit()) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/nose/core.py", line 121, in __init__ **extra_args) File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__ self.parseArgs(argv) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/nose/core.py", line 145, in parseArgs self.config.configure(argv, doc=self.usage()) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/nose/config.py", line 347, in configure self.plugins.begin() File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 99, in __call__ return self.call(*arg, **kw) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 167, in simple result = meth(*arg, **kw) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/pylons/test.py", line 74, in begin relative_to=path) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 247, in loadapp return loadobj(APP, uri, name=name, **kw) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 272, in loadobj return context.create() File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 710, in create return self.object_type.invoke(self) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/deploy/loadwsgi.py", line 146, in invoke return fix_call(context.object, context.global_conf, **context.local_conf) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/deploy/util.py", line 55, in fix_call val = callable(*args, **kw) File "/usr/lib/ckan/default/src/ckan/ckan/config/middleware/__init__.py", line 55, in make_app load_environment(conf, app_conf) File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 112, in load_environment p.load_all() File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 130, in load_all unload_all() File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 183, in unload_all unload(*reversed(_PLUGINS)) File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 211, in unload plugins_update() File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 122, in plugins_update environment.update_config() File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 285, in update_config model.init_model(engine) File "/usr/lib/ckan/default/src/ckan/ckan/model/__init__.py", line 157, in init_model version_table = Table('migrate_version', meta.metadata, autoload=True) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 439, in __new__ metadata._remove_table(name, schema) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 434, in __new__ table._init(name, metadata, *args, **kw) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 514, in _init include_columns, _extend_on=_extend_on) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 540, in _autoload _extend_on=_extend_on File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2044, in run_callable with self.contextual_connect() as conn: File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2112, in contextual_connect self._wrap_pool_connect(self.pool.connect, None), File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2151, in _wrap_pool_connect e, dialect, self) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1465, in _handle_dbapi_exception_noconnection exc_info File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect return fn() File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 387, in connect return _ConnectionFairy._checkout(self) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 766, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 516, in checkout rec = pool._do_get() File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 1138, in _do_get self._dec_overflow() File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 1135, in _do_get return self._create_connection() File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 333, in _create_connection return _ConnectionRecord(self) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 461, in __init__ self.__connect(first_connect_check=True) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 651, in __connect connection = pool._invoke_creator(self) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect return dialect.connect(*cargs, **cparams) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 393, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user "ckan_default" FATAL: password authentication failed for user "ckan_default"

Any suggestions please?

Thanks in advance,

Marco

marcoingrosso commented 4 years ago

just resolved enabling job workers in this way:

paster --plugin=ckan jobs worker priority --config=/etc/ckan/default/production.ini

and

paster --plugin=ckan jobs worker bulk --config=/etc/ckan/default/production.ini

I suggest to update ckanext-archiver setup guide, separating the two setup procedures for ckan 2.8 (with job workers, including an example on how to enable the 2 queues) and older ckan (with celery).

Best regards!

davidread commented 4 years ago

@marcoingrosso I believe archiver is only written to use celery queues, so paster jobs won't work.

What you're doing seems riht. I'm not sure what to suggest here apart from stepping into what happens when you do the paster --plugin=ckanext-archiver archiver update --queue=priority -c /etc/ckan/default/production.ini to see if it is finding your resources, putting things on the queue, seeing them appear in the redis database, and then being picked up by your celery job.

Zharktas commented 4 years ago

@davidread Migration to jobs system was done in https://github.com/ckan/ckanext-archiver/pull/55 :)

davidread commented 4 years ago

@Zharktas Great to hear it! Perhaps the readme might need a bit more emphasis on the new jobs and only mention celery in terms of backward compatibilty.