geopython / pygeoapi

pygeoapi is a Python server implementation of the OGC API suite of standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML. pygeoapi is open source and released under an MIT license.
https://pygeoapi.io
MIT License
459 stars 250 forks source link

Implement Connection Pooling for Oracle Provider to Improve Performance #1655

Closed Moritz-Langer closed 1 week ago

Moritz-Langer commented 1 month ago

Is your feature request related to a problem? Please describe.

Performance improvements might be gained when using a connection pool for database providers. I have seen this commit to the postgres provider: https://github.com/geopython/pygeoapi/commit/644281359a5c622a7808b67ddf7773723ac60b4d

Describe the solution you'd like Upon application start check for config whether there are features/collections using the Oracle provider. If the Oracle provider is in use, the application should initialize a connection pool. Subsequent connections required by the Oracle provider should then be acquired from this pool rather than creating new connections per request.

Open Question: Initialize Pool globally for example within itemtypes.py?

Describe alternatives you've considered I thought about handling within the oracle provider alone, but this might lead to initializing pools for every request or am I mistaking here?

Additional context Connection pooling can drastically reduce the overhead associated with establishing connections to the database, particularly under load. By managing a pool of reusable connections, the system can gain improvements in response time and reduce the load on the database server.

Discussion Points

Moritz-Langer commented 1 month ago

On the weekend I started with an initial implementation along the line of the following:

class DatabaseConnection:
    """Database connection class to be used as 'with' statement.
    The class returns a connection object.
    """
    pool = None  # Class-level connection pool
    @classmethod
    def initialize_pool(cls, conn_dict):
        """Initialize the connection pool if not already initialized."""
        LOGGER.debug(f"conn_dict contains {conn_dict}")
        if DatabaseConnection.pool is None:
            DatabaseConnection.pool = oracledb.create_pool(
                            user=conn_dict["user"],
                            password=conn_dict["password"],
                            dsn=dsn,
                            min=2,  
                            max=10,  
                            increment=1, 
                        )

This looks promising and I will create a pull-request after some more polishing

tomkralidis commented 1 month ago

cc'ing @totycro and @8luewater for feedback/input.

Moritz-Langer commented 3 weeks ago

@8luewater I am currently finishing up on this. We ran internal tests using locust which showed an around 10x improvement in speed using session pooling. edit: 10x improvement given around 50 concurrent users

In the current design you would have to set environment variables like this to your linux / server environment:

export ORACLE_POOL_MIN=2
export ORACLE_POOL_MAX=10

Then the DatabaseConnection class uses these env variables as input parameters for the creation of a session pool. When these env vars are not set, the regular behaviour is triggered and single connections are established for incomming requests. @totycro and I discussed the issue of activating/configuring the session pool in the YAML file, but currently this is tricky since it would be global but would have to be written to every collection item in the yaml file.

Do you have any feedback/wishes/critique regarding this approach?

8luewater commented 3 weeks ago

Hi Moritz,That 10x improvement in speed sounds fantastic. At this stage, I am unsure how we implement that from yaml file in our environment, and it will be exciting to work on that when it’s time for us. Will have a chat with our Oracle team too about this. Cheers.Best regards,Sayaka SandinOn 12 Jun 2024, at 10:36 PM, Moritz Langer @.***> wrote: @8luewater I am currently finishing up on this. We ran internal tests using locust which showed an around 10x improvement in speed using session pooling. In the current design you would have to set environment variables like this to your linux / server environment: export ORACLE_POOL_MIN=2 export ORACLE_POOL_MAX=10

Then the DatabaseConnection class uses these env variables as input parameters for the creation of a session pool. When these env vars are not set, the regular behaviour is triggered and single connections are established for incomming requests. @totycro and I discussed the issue of activating/configuring the session pool in the YAML file, but currently this is tricky since it would be global but would have to be written to every collection item in the yaml file. Do you have any feedback/wishes/critique regarding this approach?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

tomkralidis commented 1 week ago

Added in #1688 (thanks @Moritz-Langer!)