geosolutions-it / C195-azure-workspace

1 stars 2 forks source link

Setup datastore and datapusher #27

Closed etj closed 3 years ago

etj commented 3 years ago
randomorder commented 3 years ago

@lpasquali please make sure you have all the info you need to move on with the task and assign an estimate to the issue so we can schedule it. Make changes to the checklist as needed

etj commented 3 years ago

Official doc here: https://docs.ckan.org/en/2.9/maintaining/datastore.html

This is the config of the datapusher image in the official docker-compose: https://github.com/ckan/ckan/blob/ckan-2.9.2/contrib/docker/docker-compose.yml#L45-L49 It's ok to run it into the CKAN VM.

In the ckan.ini file we need to add datastore to the ckan.plugins property list.

The datastore DB should already be set, anyway make sure that it can be properly accessed by both the datapusher app and from inside CKAN (tabular data should be displayed in a grid -- current deploy is not properly parsing the ";" as field delimiter.)

randomorder commented 3 years ago

@lpasquali please make sure you have all the info you need to move on with the task and assign an estimate to the issue so we can schedule it. Make changes to the checklist as needed

@lpasquali ?

lpasquali commented 3 years ago

@randomorder I think I can work on it, I put estimate

lpasquali commented 3 years ago

@etj I think I implemented ckan datapusher/datastore plugins correctly. if I try to move data into the datastore the csv resources actually do become correctly formatted as stated above https://github.com/geosolutions-it/C195-azure-workspace/issues/27#issuecomment-823937842 only thing is that for resources > 10 mb the import feature within the gui is not working, I do not know if pushing original json files to the datasource api instead of the default one, will work, I have not found yet how to import data that way. The 10 mb limit is hardcoded here

also the "official" datapusher image is 4 years old: image: clementmouchet/datapusher as can be seen from the repository: https://github.com/clementmouchet/datapusher

current upstream repository of datapusher is missing a Dockerfile but the code would support to setup MAX_CONTENT_LENGTH as env variable:

https://github.com/ckan/datapusher

I would suggest to add another submodule, for https://github.com/ckan/datapusher use Dockerfile from https://github.com/clementmouchet/datapusher and make our datapusher image, with more maintained datapusher code

lpasquali commented 3 years ago

PR with work up to now: https://github.com/geosolutions-it/C195-azure-workspace/pull/35

etj commented 3 years ago

It seems that the official ckan docker file references an old fork for the datapusher, which has not been updated in 6 years.

Issue opened about this in the official repo: https://github.com/ckan/datapusher/issues/228.

Currently working on the dockerization of the master branch of the official repo: https://github.com/geosolutions-it/datapusher/tree/228_docker

lpasquali commented 3 years ago

Currently working on the dockerization of the master branch of the official repo: https://github.com/geosolutions-it/datapusher/tree/228_docker dockerization done, updated PR, we can move on testing things on azure, I will do it tomorrow I think @etj

lpasquali commented 3 years ago

@etj unfortunately the code using datastore writing and read only users is not working for similar reasons we found in past:

ckan          | [SQL: SELECT has_table_privilege(%s, '_foo', %s)]
ckan          | [parameters: ('datastore_ro@testpostgres01aaaa', 'INSERT')]
ckan          | (Background on this error at: http://sqlalche.me/e/f405)
ckan          | Setting var and venv...
ckan          | Initting DB...
ckan          | 2021-05-11 16:18:04,718 INFO  [ckan.cli] Using configuration file /etc/ckan/production.ini
ckan          | 2021-05-11 16:18:04,719 INFO  [ckan.config.environment] Loading static files from public
ckan          | 2021-05-11 16:18:04,721 DEBUG [ckan.lib.webassets_tools] Base path /usr/lib/ckan/venv/src/ckan/ckan/public/base
ckan          | 2021-05-11 16:18:04,751 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/venv/src/ckan/ckan/templates
ckan          | 2021-05-11 16:18:04,959 DEBUG [ckan.logic] check access OK - get_site_user user=None
ckan          | 2021-05-11 16:18:05,004 INFO sqlalchemy.pool.impl.QueuePool Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
ckan          | 2021-05-11 16:18:05,004 INFO  [sqlalchemy.pool.impl.QueuePool] Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
ckan          | 2021-05-11 16:18:05,006 INFO sqlalchemy.pool.impl.QueuePool Pool recreating
ckan          | 2021-05-11 16:18:05,006 INFO  [sqlalchemy.pool.impl.QueuePool] Pool recreating
ckan          | 2021-05-11 16:18:05,017 INFO  [rdflib] RDFLib Version: 4.2.1
ckan          | 2021-05-11 16:18:05,113 DEBUG [ckan.lib.webassets_tools] Base path /usr/lib/ckan/venv/src/ckan/ckan/public/base
ckan          | 2021-05-11 16:18:05,279 DEBUG [ckanext.azure_auth.auth_config] Loading ADFS ID Provider configuration.
ckan          | 2021-05-11 16:18:05,280 INFO  [ckanext.azure_auth.auth_config] Trying to get OpenID Connect config from https://login.microsoftonline.com/00000000-0000-0000-0000-000000000000/.well-known/openid-configuration?appid=00000000-0000-0000-0000-000000000000
ckan          | 2021-05-11 16:18:05,363 INFO  [ckanext.azure_auth.auth_config] Trying to get ADFS Metadata file https://login.microsoftonline.com/00000000-0000-0000-0000-000000000000/FederationMetadata/2007-06/FederationMetadata.xml
ckan          | 2021-05-11 16:18:05,439 CRITI [ckanext.azure_auth.auth_config] Could not load any data from ADFS server. Authentication against ADFS is not possible. 
ckan          | 2021-05-11 16:18:05,440 CRITI [ckanext.azure_auth.plugin] Could not load any data from ADFS server. Authentication against ADFS is not possible. 
ckan          | 2021-05-11 16:18:05,442 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/venv/src/ckan/ckan/templates
ckan          | Traceback (most recent call last):
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
ckan          |     cursor, statement, parameters, context
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
ckan          |     cursor.execute(statement, parameters)
ckan          | psycopg2.errors.UndefinedObject: role "datastore_ro@testpostgres01aaaa" does not exist
ckan          | 
ckan          | 
ckan          | The above exception was the direct cause of the following exception:
ckan          | 
ckan          | Traceback (most recent call last):
ckan          |   File "/usr/lib/ckan/venv/bin/ckan", line 33, in <module>
ckan          |     sys.exit(load_entry_point('ckan', 'console_scripts', 'ckan')())
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
ckan          |     return self.main(*args, **kwargs)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 781, in main
ckan          |     with self.make_context(prog_name, args, **extra) as ctx:
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 700, in make_context
ckan          |     self.parse_args(ctx, args)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 1212, in parse_args
ckan          |     rest = Command.parse_args(self, ctx, args)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 1048, in parse_args
ckan          |     value, args = param.handle_parse_result(ctx, opts, args)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 1630, in handle_parse_result
ckan          |     value = invoke_param_callback(self.callback, ctx, self, value)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 123, in invoke_param_callback
ckan          |     return callback(ctx, param, value)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/cli/cli.py", line 100, in _init_ckan_config
ckan          |     ctx.obj = CkanCommand(value)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/cli/cli.py", line 50, in __init__
ckan          |     self.app = make_app(self.config)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/middleware/__init__.py", line 24, in make_app
ckan          |     load_environment(conf)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 122, in load_environment
ckan          |     p.load_all()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 165, in load_all
ckan          |     load(*plugins)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 193, in load
ckan          |     plugins_update()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 153, in plugins_update
ckan          |     environment.update_config()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 296, in update_config
ckan          |     plugin.configure(config)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/plugin.py", line 81, in configure
ckan          |     self.backend.configure(config)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/backend/postgres.py", line 1777, in configure
ckan          |     self._check_urls_and_permissions()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/backend/postgres.py", line 1661, in _check_urls_and_permissions
ckan          |     if not self._read_connection_has_correct_privileges():
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/backend/postgres.py", line 1708, in _read_connection_has_correct_privileges
ckan          |     (read_connection_user, privilege)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 982, in execute
ckan          |     return self._execute_text(object_, multiparams, params)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1155, in _execute_text
ckan          |     parameters,
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
ckan          |     e, statement, parameters, cursor, context
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
ckan          |     util.raise_from_cause(sqlalchemy_exception, exc_info)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 399, in raise_from_cause
ckan          |     reraise(type(exception), exception, tb=exc_tb, cause=cause)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
ckan          |     raise value.with_traceback(tb)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
ckan          |     cursor, statement, parameters, context
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
ckan          |     cursor.execute(statement, parameters)
ckan          | sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedObject) role "datastore_ro@testpostgres01aaaa" does not exist
ckan          | 
ckan          | [SQL: SELECT has_table_privilege(%s, '_foo', %s)]
ckan          | [parameters: ('datastore_ro@testpostgres01aaaa', 'INSERT')]
ckan          | (Background on this error at: http://sqlalche.me/e/f405)

ckan configuration of datastore database is correct, but the app itself is not able to determine the user even trying to escape, as in past the @ with %40, unfortunately the datastore code does not use ckan models for db

etj commented 3 years ago

Found some other issues in the datastore:

etj commented 3 years ago

I'm going to check the issue in the datastore code

randomorder commented 3 years ago

updates @lpasquali ?

lpasquali commented 3 years ago

Hello @etj did you get further on the datastore database (azure related) issues?

randomorder commented 3 years ago

please let us know @etj

etj commented 3 years ago

The datastore does not need any fix. The pg role should be created the proper way:

psql -U $arg1@$arg3 -h $arg4 postgres -c "CREATE ROLE "datastore_ro@$arg3" NOCREATEDB NOCREATEROLE LOGIN PASSWORD '${arg5}';"

instead of

psql -U $arg1@$arg3 -h $arg4 postgres -c "CREATE ROLE "datastore_ro" NOCREATEDB NOCREATEROLE LOGIN PASSWORD '${arg5}';"

It means you have to add the @PGHOST part in the role name yourself.

etj commented 3 years ago

The set-permission.sql fails because sql command referencing the role ckan@etj-pg3 are failing, in that such role does not exist, even if the psql command is run using that very username. It seems that the default user created at startup is handled by the azure pg in a different way than the roles created by hand. Adding a sed -e "s/ckan@etj-pg3/ckan/g" in the pipe running set-permission.sql.

etj commented 3 years ago

I added a few commits to the datapusher-datastore-ckan branch that fixes the configuration (not any problem in the datastore per se, there were some configuration issues our side).

The deploy procedure now completes successfully and ckan is properly launched.

@lpasquali I guess this is unblocked now.

randomorder commented 3 years ago

please go ahead @lpasquali

We need this before COB Friday

lpasquali commented 3 years ago

I finally was able to get datastore, datapusher and their interactions working I cleaned the branch conflicts I am doing one last clean deployment before making a last needed modification before the PR can be merged

https://github.com/geosolutions-it/C195-azure-workspace/pull/35/

lpasquali commented 3 years ago

Screenshot from 2021-05-28 17-01-05 making PR #35 ready