EIDA / mediatorws

EIDA NG Mediator/Federator web services
GNU General Public License v3.0
6 stars 6 forks source link

stl: stationlite with Postgres #74

Closed damb closed 5 years ago

damb commented 5 years ago

@Jollyfant,

for stationlite currently we're testing a Postgres DB instead of SQLite. Roughly the change results in a performance increase of about a factor 2-4 (depending on the request). Could you apply the change to the docker setup? This would be great. Code changes shouldn't be necessary.

Thanks in advance.

Jollyfant commented 5 years ago

Sure, where is the configuration flag that needs to be set to use psql instead of sqlite?

Jollyfant commented 5 years ago

Just the engine?

https://github.com/EIDA/mediatorws/blob/master/config/eidangws_config#L159

damb commented 5 years ago

Just the engine?

https://github.com/EIDA/mediatorws/blob/master/config/eidangws_config#L159

https://github.com/EIDA/mediatorws/blob/master/config/eidangws_config#L95, too. eida-stationlite and eida-stationlite-harvest have separate config sections within the config file.

Will you integrate a Postgres container to the docker setup?

damb commented 5 years ago

Will you integrate a Postgres container to the docker setup?

If yes, we have to initialize the DB properly (i.e. table setup etc.)

kaestli commented 5 years ago

Note that this will scale over-proportionally with parallel requests to the federator.

damb commented 5 years ago

BTW, this will solve as well #65.

Jollyfant commented 5 years ago

Will you integrate a Postgres container to the docker setup?

If yes, we have to initialize the DB properly (i.e. table setup etc.)

Yeah inside the docker-compose but in a seperate container. Now that I think of it.. where did the sqlite schema come from 😕

damb commented 5 years ago

https://github.com/EIDA/mediatorws/tree/next/db

Mmh. There is an utility located at https://github.com/EIDA/mediatorws/blob/master/eidangservices/stationlite/harvest/misc.py. (Available by means of eida-stationlite-db-init.) However, currently it is only implemented to create a new sqlite DB file. Do you want me to adapt it in order to allow writing the schema to other DBs, too?

damb commented 5 years ago

Something like in https://github.com/EIDA/mediatorws/tree/feature/db-init. I didn't test it, yet.

Jollyfant commented 5 years ago

So I will:

Are you still committing to the branch or should I try it?

damb commented 5 years ago

@Jollyfant,

with

eida-stationlite-db-init --sql postgresql://

you now should be able to create the SQL sequence from the ORM in order to proceed as described here. I guess this is a good point to start from.

Note, that the SQL script we use for DB initialization should be idempotent.

damb commented 5 years ago

Are you still committing to the branch or should I try it?

Give it a try. As soon as it is working for you, I merge it back.

  • Update eidangws_config to point to postgresql://user:pass@localhost/stationlite.

Is there a way not to write the user credentials to the config files? IMO we should initialize the container with (a) corresponding environment variable(s).

Jollyfant commented 5 years ago

You can pass environment variables when you run the container using the -e flag in Docker run. But the source needs to read from os.env then on where to connect.

Jollyfant commented 5 years ago

So something like os.env.get("DBUSER") or "" & os.env.get("DBPASS") or ""

damb commented 5 years ago

So something like os.env.get("DBUSER") or "" & os.env.get("DBPASS") or ""

What do you think about an approach templating the config files?

The confd approach as described here should suite our needs perfectly.

Jollyfant commented 5 years ago

The confd approach as described here should suite our needs perfectly.

Not so simple I think. The baseimage container has this CMD executed when the container is run:

CMD ["/sbin/my_init"]

and somehow we have to shoehorn confd in to that init script (which we do not have access to) rewrite the environment variables.

damb commented 5 years ago

@Jollyfant, did you read https://github.com/phusion/baseimage-docker#running_startup_scripts?

Jollyfant commented 5 years ago

Yes but it's another dependency and complexity.. the environment can be read inside the Python code too.

damb commented 5 years ago

Yes but it's another dependency and complexity.. the environment can be read inside the Python code too.

Fine. However, this would require adjusting the sourcecode which most probably is more time consuming. As you want ... Personally, for me both solutions would be fine, as long as they are properly implemented and documented.

Jollyfant commented 5 years ago

Yeah for now and testing I'm gonna my details in the configuration files.

root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-harvest --sql postgresql://user:pass@docker_psql_1:5432/stationlite
USAGE ERROR: argument URL: Invalid database URL.
damb commented 5 years ago

@Jollyfant,

you most probably mean:

root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-harvest postgres://user:pass@docker_psql_1:5432/stationlite
Jollyfant commented 5 years ago

Nope I tried that one too with the same error

damb commented 5 years ago

Sure? Which version are you using? The most recent one?

$ eida-stationlite-harvest -V
Jollyfant commented 5 years ago

I guess so

eida-stationlite-harvest version 0.9.5rc1

damb commented 5 years ago
root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-harvest postgres://user:pass@docker_psql_1:5432/stationlite

must be

root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-harvest postgresql://user:pass@docker_psql_1:5432/stationlite

Sorry.

damb commented 5 years ago

I guess so

eida-stationlite-harvest version 0.9.5rc1

You initialized the DB properly? I.e. user:pass exists, stationlite DB exists? Schema is properly written?

Jollyfant commented 5 years ago

I thought stationlite would write the schema. Do I need to do that manually; where can I find it? But the database exists. But it seems to be a problem with the URL right, not the connection per se?

root@3b1f44678690:/# psql -U user stationlite psql (11.2 (Debian 11.2-1.pgdg90+1)) Type "help" for help.

stationlite=# \q

damb commented 5 years ago
  1. I thought stationlite would write the schema.

    Nope. There is something like eida-stationlite-db-init which writes the schema (as mentioned above).

  2. Perhaps the error message is misleading. To verify the URL eida-stationlite-harvest creates an engine using SQLAlchemy's create_engine(), Hence, it might be also a problem with the connection. I'm going to fix the error message.

Jollyfant commented 5 years ago

I think the connection is fine:

import psycopg2

try:
    connection = psycopg2.connect(user = "user",
                                  password = "pass",
                                  host = "docker_psql_1",
                                  port = "5432",
                                  database = "stationlite")

    cursor = connection.cursor()
    print(connection.get_dsn_parameters())
    cursor.execute("SELECT version();")
    record = cursor.fetchone()
    print("You are connected to - ", record,"\n")

except (Exception, psycopg2.Error) as error :
    print ("Error while connecting to PostgreSQL", error)

finally:
    if(connection):
        cursor.close()
        connection.close()
        print("PostgreSQL connection is closed")

Output:

{'tty': '', 'sslcompression': '0', 'dbname': 'stationlite', 'options': '', 'host': 'docker_psql_1', 'target_session_attrs': 'any', 'user': 'user', 'sslmode': 'prefer', 'port': '5432', 'krbsrvname': 'postgres'}
('You are connected to - ', ('PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit',), '\n')
PostgreSQL connection is closed

And db-init gives no output whatsoever.

root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-db-init postgresql://user:pass@docker_psql_1:5432/stationlite
root@12a848e4098a:/var/www/mediatorws#
damb commented 5 years ago
  1. Did you install the driver (i.e. psycopg2) into the venv? I didn't mention that but there is now an postgres extra flag which may be used for this purpose.

  2. root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-db-init postgresql://user:pass@docker_psql_1:5432/stationlite root@12a848e4098a:/var/www/mediatorws#

    The pattern I generally use is verbosity by means of logging. Try:

    root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-db-init --logging-conf path/to/logging.conf postgresql://user:pass@docker_psql_1:5432/stationlite

    .... or check the exit code i.e.

root@12a848e4098a:/var/www/mediatorws# /var/www/stationlite/venv3/bin/eida-stationlite-db-init postgresql://user:pass@docker_psql_1:5432/stationlite
root@12a848e4098a:/var/www/mediatorws# echo $?
Jollyfant commented 5 years ago

Did you install the driver (i.e. psycopg2) into the venv? I didn't mention that but there is now an postgres extra flag which may be used for this purpose.

I guess the Dockerfile should do this automatically when the stationlite is made right? It's pulling many Python dependencies. But it's not there:

root@12a848e4098a:/var/www/mediatorws# source /var/www/stationlite/venv3/bin/activate
(venv3) root@12a848e4098a:/var/www/mediatorws# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psycop2g
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'psycop2g'
Jollyfant commented 5 years ago
<EIDA> 2019-04-04T09:24:18+0000 CRITICAL root 428 misc.py:133 - Traceback information: ['Traceback (most recent call last):\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2262, in _wrap_pool_connect\n    return fn()\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 354, in connect\n    return _ConnectionFairy._checkout(self)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 751, in _checkout\n    fairy = _ConnectionRecord.checkout(pool)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 483, in checkout\n    rec = pool._do_get()\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/impl.py", line 237, in _do_get\n    return self._create_connection()\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 299, in _create_connection\n    return _ConnectionRecord(self)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 428, in __init__\n    self.__connect(first_connect_check=True)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 630, in __connect\n    connection = pool._invoke_creator(self)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/strategies.py", line 114, in connect\n    return dialect.connect(*cargs, **cparams)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/default.py", line 453, in connect\n    return self.dbapi.connect(*cargs, **cparams)\n', 'sqlite3.OperationalError: unable to open database file\n', '\nThe above exception was the direct cause of the following exception:\n\n', 'Traceback (most recent call last):\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/stationlite-0.9.5rc1-py3.5.egg/eidangservices/stationlite/harvest/misc.py", line 120, in run\n    orm.ORMBase.metadata.create_all(engine)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/sql/schema.py", line 4287, in create_all\n    ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2032, in _run_visitor\n    with self._optional_conn_ctx_manager(connection) as conn:\n', '  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__\n    return next(self.gen)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2024, in _optional_conn_ctx_manager\n    with self._contextual_connect() as conn:\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2226, in _contextual_connect\n    self._wrap_pool_connect(self.pool.connect, None),\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2266, in _wrap_pool_connect\n    e, dialect, self\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1536, in _handle_dbapi_exception_noconnection\n    util.raise_from_cause(sqlalchemy_exception, exc_info)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/util/compat.py", line 383, in raise_from_cause\n    reraise(type(exception), exception, tb=exc_tb, cause=cause)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/util/compat.py", line 128, in reraise\n    raise value.with_traceback(tb)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2262, in _wrap_pool_connect\n    return fn()\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 354, in connect\n    return _ConnectionFairy._checkout(self)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 751, in _checkout\n    fairy = _ConnectionRecord.checkout(pool)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 483, in checkout\n    rec = pool._do_get()\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/impl.py", line 237, in _do_get\n    return self._create_connection()\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 299, in _create_connection\n    return _ConnectionRecord(self)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 428, in __init__\n    self.__connect(first_connect_check=True)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/pool/base.py", line 630, in __connect\n    connection = pool._invoke_creator(self)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/strategies.py", line 114, in connect\n    return dialect.connect(*cargs, **cparams)\n', '  File "/var/www/stationlite/venv3/lib/python3.5/site-packages/SQLAlchemy-1.3.2-py3.5-linux-x86_64.egg/sqlalchemy/engine/default.py", line 453, in connect\n    return self.dbapi.connect(*cargs, **cparams)\n', 'sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file\n(Background on this error at: http://sqlalche.me/e/e3q8)\n']

:laughing: so it's still trying to open a sqllite file or something? In my configuration I have:

# Stationlite
db_url = postgresql://orfeus:stationlite@ldocker_psql_1:5432/stationlite

# Harvest
db_engine = postgresql://user:pass@docker_psql_1:5432/stationlite

One is called db_url and one is called db_engine..

damb commented 5 years ago

guess the Dockerfile should do this automatically when the stationlite is made right? It's pulling many Python dependencies. But it's not there:

Mmh. As I mentioned, there is now an postgres extra flag automatically installing the required parameters.

However, we need to adapt the Makefile in order to make use of this facilitlity. I can do that.

damb commented 5 years ago

so it's still trying to open a sqllite file or something?

The error message is related to which program/command?

Jollyfant commented 5 years ago
/var/www/stationlite/venv3/bin/eida-stationlite-db-init --logging-conf /var/www/mediatorws/config/logging.conf postgresql://user:pass@docker_psql_1:5432/stationlite
damb commented 5 years ago

Mmh. As I mentioned, there is now an postgres extra flag automatically installing the required parameters.

However, we need to adapt the Makefile in order to make use of this facilitlity. I can do that.

I can't :laughing:! The python setup.py command apparently does not allow installation using extras. The packaging structure is super fuzzy since we're trying to actually put three packages into one repo :-1: ...

As a consequence, we have to install it manually into the virtualenv which means we have to adapt the Dockerfile.

damb commented 5 years ago

... it's still trying to open a sqlite file or something?

@Jollyfant, but you're using the right branch i.e. feature/db-init? I didn't merge it back, yet!

Jollyfant commented 5 years ago

... it's still trying to open a sqlite file or something?

@Jollyfant, but you're using the right branch i.e. feature/db-init? I didn't merge it back, yet!

Actually yes I swapped to it but when building the Dockerfile it pulls master.. so no.

Jollyfant commented 5 years ago

That seemed to work. Cheers. I had to add pip install psycop2g and libpq-dev to the Dockerfile.

<EIDA> 2019-04-04T10:36:10+0000 INFO root 199 app.py:217 - Using logging configuration read from "/var/www/mediatorws/config/logging.conf"
<EIDA> 2019-04-04T10:36:10+0000 INFO root 199 misc.py:116 - StationLiteDBInitApp: Version 0.9.5rc1
<EIDA> 2019-04-04T10:36:10+0000 DEBUG root 199 misc.py:140 - Creating database tables ...
Jollyfant commented 5 years ago

Harvesting is taking place now.. I will see if it works then it can be merged to master and I will fix the Dockerfile and create a pully.

Jollyfant commented 5 years ago

See #76

damb commented 5 years ago

Solved with 6cf2d716f0dbae0af9da439a20f14f0f73b108e8.