cellarium-ai / cellarium-cloud

Cellarium Cloud Core Library
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Can't reconnect until invalid transaction is rolled back #139

Closed sentry-io[bot] closed 5 months ago

sentry-io[bot] commented 8 months ago

Sentry Issue: CELLARIUM-CLOUD-1K

PendingRollbackError: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)
(23 additional frame(s) were not displayed)
...
  File "casp/services/model_inference/routers.py", line 20, in embed
    return model_inference_service.embed_adata_file(file_to_embed=file.file, model_name=model_name)
  File "casp/services/model_inference/services.py", line 108, in embed_adata_file
    model_info = self.model_inference_dm.get_model_by(model_name=model_name)
  File "casp/services/model_inference/data_managers.py", line 21, in get_model_by
    return models.CASModel.query.filter_by(model_name=model_name).first()
fedorgrab commented 7 months ago

The error probably's happening because of this insert stement doesn't have a database error handling.

fedorgrab commented 7 months ago

The hypothesis behind this is as follows: During benchmarking, the database receives more connections than it can handle due to the high volume of requests. As a result, it throws this error when attempting to update a user, acquiring a connection that hasn't been rolled back, which consequently causes PendingRollbackError in consequent requests in the same server workers.

fedorgrab commented 7 months ago

2 quick things that could help:

  1. Adding a db error handling in insert statements with a rollback
  2. Increasing maximum number of db connections
fedorgrab commented 7 months ago

Updating pool size instead of maximum number of db connections might be a better point to start with.

fedorgrab commented 5 months ago

Was resolved after (#145) was merged