building-envelope-data / metabase

Metabase of the product data network buildingenvelopedata.org
2 stars 3 forks source link

Heroku 30 second timeout #200

Open christoph-maurer opened 1 month ago

christoph-maurer commented 1 month ago

Sometimes, queries to IGSDB receive the response

  "errors": [
    {
      "message": "The request exceeded the configured timeout of `00:00:30`.",
      "extensions": {
        "code": "HC0045"
      }
    }
  ]

There are the following questions:

  1. At which times do the requests exceed the timeout? Is it just a matter of maintenance times?
  2. How do we deal best with the timeout? The Heroku timeout value seems to be not configurable.
christoph-maurer commented 1 month ago

@danielmcquillen @simon-wacker What to you think about the questions?

simon-wacker commented 1 month ago

One possible solution when timeouts happen is to use paging, that is, to only ask for a specific range of products from the IGSDB. Two ways in which this can be done come to mind:

danielmcquillen commented 1 month ago

@simon-wacker @christoph-maurer I've grown to hate that limit (and dislike Heroku for lots of other reasons, but that's another story) but my guess is moving to another platform (e.g. Fly.io) or even a self-host PaaS (Coolify, Dokku) probably isn't in the cards, at least in the short- to mid-term.

I experienced the same issue with the REST API, which is used by the web client to populate the grid you see when using the site. We need all records at once there. The way I deal with it is to cache in REDIS the queries that fetches all products, all glazing and all shading, so that these queries we're expecting when a user visits the site UI run as fast as possible and beat the timeout. This is feasible since the underlying data doesn't change frequently.

For the GraphQL feature, I implemented a materialized table in postgresql to support the queries. This was the only way I could get around the timeout issue for even moderately complex queries. See the ProductMaterialized class.

Using a materialized table is sub-optimal, because it squashes the rich data model into a flat and limited representation, and adds an extra build step. But I didn't figure out a better way yet. I think I was able to implement a paging mechanism via strawberry-django, but can't remember where I left it. This would support @simon-wacker 's suggestion above.

I wonder why you're getting a timeout though, @christoph-maurer. The materialized table should be supporting a quick query. @christoph-maurer can you post your query? @simon-wacker any thoughts?

simon-wacker commented 1 month ago

@danielmcquillen I like your solutions to make requests quick. I believe I made a wrong guess yesterday regarding the origin of the timeout. As far as I can tell, the timeout happened on the GraphQL request to the metabase. On the request for data, the metabase requests all registered product-data databases, combines the answers and sends the combined result back. So, it may also have been the case that the metabase itself was unresponsive, that one of the other databases took too long to answer the request, or that the combined request-response times took too long in total. And with too long I mean that it exceeded the request timeout of the browser in use. I just sent various requests for optical data to the metabase and it took between 4 and 6 seconds with a mean of about 4.7 seconds. This is much faster than the default request timeout of firefox. I'll try to install some logging/profiling to figure out the source of the problem if a timeout should happen again.