databio / bedhost

API and UI for BEDbase
http://api.bedbase.org
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Intermittent failures #59

Closed nsheff closed 1 year ago

nsheff commented 2 years ago

Just time to open an issue about this so we can track it better.

We've discovered that the GraphQL endpoint gets stale and starts to not work intermittently, which makes the bedbase.org server unreliable. It appears to be only the GraphQL endpoint. Maybe due to sessions or connections being mismanaged.

The whole thing worked great until the switch to graphql/sqlalchemy.

nsheff commented 2 years ago

Sqlalchemy 2.0 is under development. It might be worth just holding off on putting much effort into sqlalchemy until they release that, and then we can just start fresh using version 2.0.

https://docs.sqlalchemy.org/en/20/

There's no ETA for when it would be released, but they are actively working on it right now. I'd guess maybe a few months?

xuebingjie1990 commented 2 years ago

By adding the

except:
                _LOGGER.info("session.rollback")
                session.rollback()

block. The GraphQL endpoint is working with no errors for the past 16 days after redeploy the dev server. However, there is a new 500 (Internal Server Error) for the /api/bedsets endpoints (showed up 15 days after redeploy the server.)

nsheff commented 2 years ago

Here's an error message I see:

INFO:      - "GET /api/bedset/all/data/count HTTP/1.0" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 375, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
    return await dependant.call(**values)
  File "/app/bedhost/routers/api.py", line 368, in get_bedset_count
    return int(bbc.bedset.record_count)
  File "/app/src/pipestat/build/lib/pipestat/pipestat.py", line 280, in record_count
    else self._count_rows(self.namespace)
  File "/app/src/pipestat/build/lib/pipestat/pipestat.py", line 959, in _count_rows
    return s.query(self[DB_ORMS_KEY][table_name].id).count()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3145, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2879, in scalar
    ret = self.one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1691, in execute
    conn = self._connection_for_bind(bind)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1532, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
    conn = bind.connect()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3250, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 476, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
nsheff commented 2 years ago

From the docs I think this is a likely cause:

The application is not returning connections to the pool - This is the next most common reason, which is that the application is making use of the connection pool, but the program is failing to release these connections and is instead leaving them open. The connection pool as well as the ORM Session do have logic such that when the session and/or connection object is garbage collected, it results in the underlying connection resources being released, however this behavior cannot be relied upon to release resources in a timely manner.

A common reason this can occur is that the application uses ORM sessions and does not call Session.close() upon them one the work involving that session is complete. Solution is to make sure ORM sessions if using the ORM, or engine-bound Connection objects if using Core, are explicitly closed at the end of the work being done, either via the appropriate .close() method, or by using one of the available context managers (e.g. “with:” statement) to properly release the resource.

nsheff commented 2 years ago

Also: "when this error message is occurring, it is always due to a bigger problem in the application itself; the pool just helps to reveal the problem sooner."

xuebingjie1990 commented 2 years ago

From the docs I think this is a likely cause:

Solution is to make sure ORM sessions if using the ORM, or engine-bound Connection objects if using Core, are explicitly closed at the end of the work being done, either via the appropriate .close() method, or by using one of the available context managers (e.g. “with:” statement) to properly release the resource.

We already added the .close() method here https://github.com/pepkit/pipestat/blob/dev/pipestat/pipestat.py#L476-L484

Do you see "session.close" printed in the log? Something like this, from my local bedhost:

session.close
INFO:     127.0.0.1:40452 - "GET /api/bed/all/data/count HTTP/1.1" 200 OK
session.close
INFO:     127.0.0.1:40454 - "GET /api/bed/genomes HTTP/1.1" 200 OK
INFO:     127.0.0.1:40456 - "GET /api/versions HTTP/1.1" 200 OK
session.close
INFO:     127.0.0.1:40456 - "GET /api/bedset/all/data/count HTTP/1.1" 200 OK
nsheff commented 2 years ago

Yes, it did say session.close sometimes. It may be closing sometimes, but not every time it needs to?

xuebingjie1990 commented 2 years ago

For the record, here are other connection errors in the Log (in the order they showed up): 1)

WARNING:  Invalid HTTP request received.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 131, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 212, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserInvalidMethodError: Invalid method encountered

2)

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 739, in _finalize_fairy
    fairy._reset(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 988, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 682, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: terminating connection due to idle-in-transaction timeout
SSL connection has been closed unexpectedly

3) graphql endpoint still has the Can't reconnect until invalid transaction is rolled back error, but resolved on it own

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/graphene_sqlalchemy/fields.py", line 78, in connection_resolver
    return on_resolve(resolved)
  File "/usr/local/lib/python3.8/site-packages/graphene_sqlalchemy/fields.py", line 53, in resolve_connection
    _len = resolved.count()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3145, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2879, in scalar
    ret = self.one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1692, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1614, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1700, in _execute_context
    conn = self._revalidate_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 565, in _revalidate_connection
    self._invalid_transaction()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 537, in _invalid_transaction
    raise exc.PendingRollbackError(
graphql.error.located_error.GraphQLLocatedError: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)

INFO:     52.206.204.252:40198 - "POST /graphql/ HTTP/1.0" 400 Bad Request
INFO:     52.206.204.252:40200 - "POST /graphql/ HTTP/1.0" 200 OK

4) sqlalchemy.exc.TimeoutError after session.rollback

session.rollback
session.close
INFO:     52.206.204.252:40736 - "GET /api/bedset/all/data/count HTTP/1.0" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 375, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
    return await dependant.call(**values)
  File "/app/bedhost/routers/api.py", line 368, in get_bedset_count
    return int(bbc.bedset.record_count)
  File "/app/src/pipestat/build/lib/pipestat/pipestat.py", line 280, in record_count
    else self._count_rows(self.namespace)
  File "/app/src/pipestat/build/lib/pipestat/pipestat.py", line 959, in _count_rows
    return s.query(self[DB_ORMS_KEY][table_name].id).count()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3145, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2879, in scalar
    ret = self.one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1691, in execute
    conn = self._connection_for_bind(bind)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1532, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
    conn = bind.connect()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3204, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3283, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3250, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 476, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)

About 15 days after deploy the dev server, the 500 error (sqlalchemy.exc.TimeoutError, the 4th one in the above list) showed up with bedset endpoints. When I checked it again the next day, the error resolved on it's own.

nsheff commented 2 years ago

I had a thought here.

I notice that the PipestatManager object session function has a @contextmanager decorator.

https://github.com/pepkit/pipestat/blob/1375dab0fdcd1bfdb1d2bf75839eb06a8c50c2f9/pipestat/pipestat.py#L465-L485

And yet, it's actually entering its own contextmanager within the function.

This is really confusing to me. The sessions should be used as context managers, like they are there in that function... but what happens when you use a context manager within a context manager ?

I can't help but think this is leading to the session.close not being called as much as it should, or something. I wonder if there's a better way to do this without nesting context managers.

@stolarczyk if you had any insight to chime in here it would be appreciated.

nsheff commented 2 years ago

Errors are still happening:

Redirecting to: http://data.bedbase.org/bigbed_files/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bigBed
INFO:     52.206.204.252:60050 - "GET /api/bed/78c0e4753d04b238fc07e4ebe5a02984/file/bigBed HTTP/1.0" 307 Temporary Redirect
INFO:     52.206.204.252:60054 - "POST /graphql HTTP/1.0" 307 Temporary Redirect
An error occurred while resolving field bedfiles__bedsets__distancesQuery.bedfiles
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/graphene_sqlalchemy/fields.py", line 78, in connection_resolver
    return on_resolve(resolved)
  File "/usr/local/lib/python3.8/site-packages/graphene_sqlalchemy/fields.py", line 53, in resolve_connection
    _len = resolved.count()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3145, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2879, in scalar
    ret = self.one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1692, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1614, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1700, in _execute_context
    conn = self._revalidate_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 565, in _revalidate_connection
    self._invalid_transaction()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 537, in _invalid_transaction
    raise exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/graphene_sqlalchemy/fields.py", line 78, in connection_resolver
    return on_resolve(resolved)
  File "/usr/local/lib/python3.8/site-packages/graphene_sqlalchemy/fields.py", line 53, in resolve_connection
    _len = resolved.count()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3145, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2879, in scalar
    ret = self.one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1692, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1614, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1700, in _execute_context
    conn = self._revalidate_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 565, in _revalidate_connection
    self._invalid_transaction()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 537, in _invalid_transaction
    raise exc.PendingRollbackError(
graphql.error.located_error.GraphQLLocatedError: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)

INFO:     52.206.204.252:60056 - "POST /graphql/ HTTP/1.0" 400 Bad Request
xuebingjie1990 commented 2 years ago

different error after https://github.com/pepkit/pipestat/commit/439dbcd1e72f67900fc4530ae234d2f051a0b37d :

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 375, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
    return await dependant.call(**values)
  File "/app/bedhost/routers/api.py", line 105, in get_bedfile_count
    return int(bbc.bed.record_count)
  File "/app/src/pipestat/pipestat/pipestat.py", line 280, in record_count
    else self._count_rows(self.namespace)
  File "/app/src/pipestat/pipestat/pipestat.py", line 960, in _count_rows
    return s.query(self[DB_ORMS_KEY][table_name].id).count()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3145, in count
    return self._from_self(col).enable_eagerloads(False).scalar()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2879, in scalar
    ret = self.one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1691, in execute
    conn = self._connection_for_bind(bind)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1532, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
    conn = bind.connect()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3210, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3289, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3256, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 310, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 868, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 476, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
nsheff commented 2 years ago
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 739, in _finalize_fairy
    fairy._reset(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 988, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 682, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: terminating connection due to idle-in-transaction timeout
SSL connection has been closed unexpectedly
nsheff commented 2 years ago

I think it's something in bbconf.

https://github.com/databio/bbconf/blob/076effdae7dd34cbe91d1c29265b243a0573e87e/bbconf/bbconf.py#L435-L455

nsheff commented 2 years ago

@xuebingjie1990 did you look through bbconf to make sure sessions are being closed correctly on all queries there? (ones that are made by bbconf direction, and not through pipestat)

nsheff commented 2 years ago

The idle in transaction errors are being caused by these queries, do you know where they are being executed?

SELECT bedfiles.id AS bedfiles_id, bedfiles.record_identifier AS bedfiles_record_identifier, bedfiles.name AS bedfiles_name, bedfiles.md5sum AS bedfiles_md5sum, bedfiles.genome AS bedfiles_genome, bedfiles.bedfile AS bedfiles_bedfile, bedfiles.bigbedfile AS bedfiles_bigbedfile, bedfiles.regions_no AS bedfiles_regions_no, bedfiles.gc_content AS bedfiles_gc_content, bedfiles.mean_absolute_tss_dist AS bedfiles_mean_absolute_tss_dist, bedfiles.mean_region_width AS bedfiles_mean_region_width, bedfiles.exon_frequency AS bedfiles_exon_frequency, bedfiles.intron_frequency AS bedfiles_intron_frequency, bedfiles.promoterprox_frequency AS bedfiles_promoterprox_frequency, bedfiles.intergenic_frequency AS bedfiles_intergenic_frequency, bedfiles.promotercore_frequency AS bedfiles_promotercore_frequency, bedfiles.fiveutr_frequency AS bedfiles_fiveutr_frequency, bedfiles.threeutr_frequency AS bedfiles_threeutr_frequency, bedfiles.fiveutr_percentage AS bedfiles_fiveutr_percentage, bedfiles.threeutr_percentage AS bedfiles_threeutr_percentage, bedfiles.promoterprox_percentage AS bedfiles_promoterprox_percentage, bedfiles.exon_percentage AS bedfiles_exon_percentage, bedfiles.intron_percentage AS bedfiles_intron_percentage, bedfiles.intergenic_percentage AS bedfiles_intergenic_percentage, bedfiles.promotercore_percentage AS bedfiles_promotercore_percentage, bedfiles.tssdist AS bedfiles_tssdist, bedfiles.chrombins AS bedfiles_chrombins, bedfiles.gccontent AS bedfiles_gccontent, bedfiles.paritions AS bedfiles_paritions, bedfiles.expected_partitions AS bedfiles_expected_partitions, bedfiles.cumulative_partitions AS bedfiles_cumulative_partitions, bedfiles.widths_histogram AS bedfiles_widths_histogram, bedfiles.neighbor_distances AS bedfiles_neighbor_distances, bedfiles.open_chromatin AS bedfiles_open_chromatin, bedfiles.other AS bedfiles_other  FROM bedfiles WHERE bedfiles.md5sum = 'a945b0cec81d4a9b328615197445b909' ORDER BY bedfiles.id ASC
SELECT bedfiles.id AS bedfiles_id, bedfiles.record_identifier AS bedfiles_record_identifier, bedfiles.name AS bedfiles_name, bedfiles.md5sum AS bedfiles_md5sum, bedfiles.genome AS bedfiles_genome, bedfiles.bedfile AS bedfiles_bedfile, bedfiles.bigbedfile AS bedfiles_bigbedfile, bedfiles.regions_no AS bedfiles_regions_no, bedfiles.gc_content AS bedfiles_gc_content, bedfiles.mean_absolute_tss_dist AS bedfiles_mean_absolute_tss_dist, bedfiles.mean_region_width AS bedfiles_mean_region_width, bedfiles.exon_frequency AS bedfiles_exon_frequency, bedfiles.intron_frequency AS bedfiles_intron_frequency, bedfiles.promoterprox_frequency AS bedfiles_promoterprox_frequency, bedfiles.intergenic_frequency AS bedfiles_intergenic_frequency, bedfiles.promotercore_frequency AS bedfiles_promotercore_frequency, bedfiles.fiveutr_frequency AS bedfiles_fiveutr_frequency, bedfiles.threeutr_frequency AS bedfiles_threeutr_frequency, bedfiles.fiveutr_percentage AS bedfiles_fiveutr_percentage, bedfiles.threeutr_percentage AS bedfiles_threeutr_percentage, bedfiles.promoterprox_percentage AS bedfiles_promoterprox_percentage, bedfiles.exon_percentage AS bedfiles_exon_percentage, bedfiles.intron_percentage AS bedfiles_intron_percentage, bedfiles.intergenic_percentage AS bedfiles_intergenic_percentage, bedfiles.promotercore_percentage AS bedfiles_promotercore_percentage, bedfiles.tssdist AS bedfiles_tssdist, bedfiles.chrombins AS bedfiles_chrombins, bedfiles.gccontent AS bedfiles_gccontent, bedfiles.paritions AS bedfiles_paritions, bedfiles.expected_partitions AS bedfiles_expected_partitions, bedfiles.cumulative_partitions AS bedfiles_cumulative_partitions, bedfiles.widths_histogram AS bedfiles_widths_histogram, bedfiles.neighbor_distances AS bedfiles_neighbor_distances, bedfiles.open_chromatin AS bedfiles_open_chromatin, bedfiles.other AS bedfiles_other 
FROM bedfiles, bedset_bedfiles 
WHERE 1 = bedset_bedfiles.bedset_id AND bedfiles.id = bedset_bedfiles.bedfile_id
rafalstepien commented 2 years ago

Hi @nsheff, I looked at this error with @xuebingjie1990. If I understood correctly there is only one database backend and a few environments with user interface that utilize this database. We run it on localhost, dev1.bedbase.com and bedbase.com. For two of them (dev1 and localhost) the search worked and the third one raised the error.

So if there is just one database this must be the frontend problem, however the error says that sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back, so for me it looks like problem with the database. Are we sure that all environments use the same database?

Anyway what I would do is try to first remotely connect directly to bedbase.com and set up the breakpoint (import pdb; pdb.set_trace() or breakpoint() in newer versions of Python) in the code to understand which part of code is affected. The second best shot IMO would be to completely reset and re-deploy bedbase.com because it could be based on some older code release.

If you need my help I could investigate this error further, just need someone to show me how do you run this code correctly.

nsheff commented 1 year ago

Closing this since the refactor to 0.4, with the accompanying upgrade to pipestat 0.6, made major changes in bedhost, including how it connects to the database. If this is still a problem as we populate the new database, we can revisit under a new issue.