Beuth-Erdelt / DBMS-Benchmarker

DBMS-Benchmarker is a Python-based application-level blackbox benchmark tool for Database Management Systems (DBMS). It connects to a given list of DBMS (via JDBC) and runs a given list of parametrized and randomized (SQL) benchmark queries. Evaluations are available via a Python interface and on an interactive multi-dimensional dashboard.
GNU Affero General Public License v3.0
13 stars 3 forks source link

Feature request - initialisation SQL per connection (initSQL) #123

Closed deenar closed 1 year ago

deenar commented 1 year ago

It will be useful to have a feature like initSQL - which gives the ability to run an SQL statement exactly once, when the connection is created.

JDBC drivers have different features, some allow setting a default schema via the connection string, some don't. initSQL will give the ability to switch to the schema having the benchmarking dataset without modifying the queries. This is especially useful when one is benchmarking with datasets of different scaling sizes viz TPCH 1, TPCH 10 TPCH 100. Currently one is forced to create another user (or modify queries), when ability to switch schemas via JDBC connection string isnt supported.

` [ { 'name': "MySQL TPCH10", 'alias': "Some DBMS", 'version': "CE 8.0.13", 'hostsystem': {'node': 'localhost'}, 'info': "This is an example: MySQL on localhost", 'active': True, 'JDBC': { 'driver': "com.mysql.cj.jdbc.Driver", 'url': "jdbc:mysql://localhost:3306/database", 'auth': ["username", "password"], 'jar': "mysql-connector-java-8.0.13.jar", 'initSQL': "USE TPCH10"; }, }, ]

`

perdelt commented 1 year ago

Do you have an example where it is not possible to switch schemas via JDBC connection string?

deenar commented 1 year ago

The above example isn't ideal. Oracle is a good example. one can one change the schema using the following

ALTER SESSION SET CURRENT_SCHEMA=tpch10

https://stackoverflow.com/questions/2353594/default-schema-in-oracle-connection-url

Something similar for SQLServer too

https://stackoverflow.com/questions/3282665/possible-to-set-default-schema-from-connection-string

ALTER USER dbmsbmk WITH DEFAULT_SCHEMA = TPCH10;

perdelt commented 1 year ago

I think you can set the database in an Oracle connection string, but you need to set up a service at first (I do not recall details): https://docs.oracle.com/middleware/12211/bip/BIPAD/GUID-FB2AEC3B-2178-48DF-8B9F-76ED2D6B5194.htm#BIPAD289

SQLServer allows you to set the database in the connection string via databaseName=: https://learn.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-ver16

deenar commented 1 year ago

1) I am fairly certain, it does not work for Oracle. Service names are for resource management. I don't think one can set a default schema in a service https://oracle-base.com/articles/10g/dbms_service. You can set it in a login trigger, but that means each user has only one default schema.

2) For SQL server you can set the database name in the connection string, but not the schema. This might be a bit less useful.

perdelt commented 1 year ago

I see. Would a static parameter init_command per connection help, that is sent each time a connection is established?

deenar commented 1 year ago

Yes, that would be perfect. Some JDBC connection pools already support this feature https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html, something similar would be great. Just to clarify, the initSQL would be executed once each connection is established, before running any queries.

perdelt commented 1 year ago

There now is a parameter init_SQL:

[
  {
    'name': "MySQL",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
    'init_SQL': "USE tpch",
  }
]

that will be evaluated starting with #117

You may want to check out the branch and test it?

deenar commented 1 year ago

Hi I pulled the changes. They work with SQL that returns a result, but there is an issue with Oracle and its switch schema statement. It does not return any results and hence it fails on fetchResult. This was the mean use of this feature. Is there an alternative way to execute SQL that does not return results?

init_SQL alter session set current_schema=TPCH Traceback (most recent call last):   File ".\benchmark.py", line 160, in     experiments.runBenchmarks()   File "DBMS-Benchmarker\dbmsbenchmarker\benchmarker.py", line 1521, in runBenchmarks     self.runBenchmarksQuery()   File “DBMS-Benchmarker\dbmsbenchmarker\benchmarker.py", line 1438, in runBenchmarksQuery     self.activeConnections[i].connect()   File "DBMS-Benchmarker\dbmsbenchmarker\tools.py", line 625, in connect     init_result = self.fetchResult()   File "DBMS-Benchmarker\dbmsbenchmarker\tools.py", line 664, in fetchResult     return self.cursor.fetchall()   File "conda\envs\pi-machine\lib\site-packages\jaydebeapi__init.py", line 593, in fetchall     row = self.fetchone()   File ".conda\envs\pi-machine\lib\site-packages\jaydebeapi\init__.py", line 558, in fetchone     raise Error() jaydebeapi.Error

perdelt commented 1 year ago

Hi @deenar , thanks for testing! DBMS-Benchmarker now does not fetch results from init_SQL in any case. I think it is not necessary anyway.

Please test again. init_SQL opens and closes a cursor, I am not sure if this is compatible with the solution you are looking for.

deenar commented 1 year ago

Thank you ver much. This works and allows to switch schemas in Oracle.