GoogleCloudPlatform / alloydb-python-connector

A Python library for connecting securely to your AlloyDB instances.
Apache License 2.0
28 stars 7 forks source link

Invalid token in metadata exchange #346

Closed xjuan closed 1 month ago

xjuan commented 1 month ago

Bug Description

New connections to alloyDB stops working after 30hs of fastapi service running on cloud run.

The problem is the token sent in the metadata exchange is never refreshed.

The following code example is a simplified version of the actual connection code.

Example code (or command)

alloy_connector = None

def getconn():
    global alloy_connector
    alloy_connector = alloy_connector or AlloyConnector(refresh_strategy="lazy")

    conn = alloy_connector.connect(
        settings.ALLOY_DB_IAM_CONNECTION_NAME,
        "pg8000",
        user=settings.ALLOY_DB_IAM_USER
        db=db_name,
        enable_iam_auth=True
    )

engine = engine_new(getconn)
Session = sessionmaker(autocommit=False, autoflush=False, bind=engine)

def get_db(tenant_id: str = None):
    session = Session()

    try:
        yield session
    except Exception:
        session.rollback()
    finally:
        session.close()

class StoreFactory:
    def __init__(self, StoreClass):
        self.store_class = StoreClass

    def __call__(self):
        return self.store_class(next(get_db()))

@router.get('/foobar')
@db_session
def foobar(store: TestStore = Depends(StoreFactory(TestStore))):
    retval = store.get('foobar'):
    if retval is None:
        raise HTTPException(status_code=404)
    return retval

Stacktrace

0: "Traceback (most recent call last):"
1: "  File "/usr/src/app/core/storage/postgresql/store.py", line 126, in get_tenant_db_connection
    conn = alloy_connector.connect("
2: "  File "/opt/sorcero_env/.venv/lib/python3.9/site-packages/google/cloud/alloydb/connector/connector.py", line 135, in connect
    return connect_task.result()"
3: "  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()"
4: "  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception"
5: "  File "/opt/sorcero_env/.venv/lib/python3.9/site-packages/google/cloud/alloydb/connector/connector.py", line 208, in connect_async
    sock = await self._loop.run_in_executor(None, metadata_partial)"
6: "  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)"
7: "  File "/opt/sorcero_env/.venv/lib/python3.9/site-packages/google/cloud/alloydb/connector/connector.py", line 313, in metadata_exchange
    raise ValueError("
8: "ValueError: Metadata Exchange request has failed with error: error checking IAM permissions: response code: 16, response message: Request had invalid authentication credentials. Expected OAuth 2 access token or self-signed JWT token. See https://developers.google.com/identity/sign-in/web/devconsole-project."

Steps to reproduce?

  1. Run fastapi service in cloudrun with a global alloydb connector and session using SqlAlchemy
  2. Start making frequent request to an api that creates a new connection to alloydb
  3. Wait for a couple of days for connector token oauth0 token to expire
  4. Exception is raised on connector.connect() (See logs)

Environment

  1. OS type and version: Docker image based on python:3.9-slim on Cloud RUN
  2. Python version: 3.9.18
  3. AlloyDB Python Connector version: 1.2.0

Additional Details

No response

jackwotherspoon commented 1 month ago

Good catch @xjuan 😄

Checking the validity of the token and refreshing it is probably a good idea prior to the metadata exchange.

We do await cache.connect_info() prior to the metadata exchange which should refresh the token before it expires in the majority of cases which is probably why it occurs so infrequently.

https://github.com/GoogleCloudPlatform/alloydb-python-connector/blob/c54b7f7e757aa2aef8b7daed4bb77b181c2bfc7a/google/cloud/alloydb/connector/connector.py#L196-L207

But I agree, explicitly checking it again should fix any race condition where the token expires between the connection info call and the metadata exchange.

enocom commented 1 month ago

This should be a rare case, but we can still get the fix in to prevent it.