agrestio / agrest

Server-side Java REST Framework for easy access to data graphs from various backends
https://agrest.io
Apache License 2.0
81 stars 34 forks source link

[DRAFT] Speedup - dangerousLimit and dangerousStart request #623

Open alevshunov opened 1 year ago

alevshunov commented 1 year ago

This story is a set of my thoughts, experience and wishes regarding optimization and ways to speedup agrest paired with cayenne.

Pseudocode is used below.

Stage 1

First most performance critical way to speedup long running requests is to dig into the amount of data. Most of my cases when agrest paired with cayenne are shocked by request is when we are loading a tiny subset from a huge piece of data without any supported filtration, like: GET /rest/game?limit=1. Depends on the security layer (@andrus , is our Security layer part of agrest or ?).

Back to our story: GET /rest/game?limit=1 - most of the time it causes to grab all the data from the database, store it in the memory and finally render only the first one.

Solution For huge amount of data most of the time we are slicing the data by any kind of criteria, like: season of a game, how much recent the result should be and cetera like: A. GET /rest/game?limit=10&cayenneExp={ "exp" : "season = $season", "params":{"season": "2023"}} B. GET /rest/game?limit=10&cayenneExp={ "exp" : "date > $yesterday", "params":{"yesterday": "<yesterday date>"}}

Disadvantages: Client is forced to playaround and implement/define some set of filters like season. (A) Not all the time it's possible to define a slice of data, especially when the pagination is involved. (B)

Stage2

When stage1 is not enough. As alternative to https://github.com/agrestio/agrest/issues/539 Next my scenario is related to loading a tiny subset of a huge amount of data with extra relations aka includes.

For example we are tracking all events in a game, like goal, hit, aggression, swapping, runs and etc in the single table named events.

GET /rest/game?limit=10&cayenneExp={ "exp" : "season = $season", "params":{"season": "2023"}}&include=events

Above request [sometimes] cause agrest paired with cayenne do next:

  1. load all games relates to the 2023 season (Stage 1)
  2. load all events relates to all games from 2023 season (This optimization is our goal as a Stage 2)
  3. join it in the memory
  4. filter first 10 records

Our workaround for that case for now is: Load basic information (like primary keys) about the slice first: const ids = await GET /rest/game?limit=10&cayenneExp={ "exp" : "season = $season", "params":{"season": "2023"}}&include=id

Load all the data but grab the slice depends on the set of id' of root entity instead of a filters: const data = await GET /rest/game?cayenneExp={ "exp" : "id in $ids", "params":{"ids": [<array of id from the request above>] }}&include=events

Finally join the information on the client: const realData = { data: data, total: ids.total };

As the result next set of request are made to the database on the backend side:

  1. first request 1.1. load all games relates to the season 2023 1.2. slice first 10 records
  2. second request 2.1. load games based on 1.2. (select <...> from games where id in [...]) 2.2. load events based on 1.2. (load only events related to the 10 games) 2.3. join tiny slice of games with tiny slice of events in the memory

Stage 3 - Feature Request

When the moon is not enough.

I've noticed that in my real cases sometimes I would like to ask agrest paired with cayenne to trust my range criteria and I'll build the filter criteria with respect to the security layer.

For example I as a user have access only to the games from LeageA, most of the case it's fine to call GET /rest/game?limit=10 - and i'll be able to access only the first ten games from LeagueA because the security layer is responsible to filter that set on the fly. But at the same moment it's add complexity to the cayenne itself and force to load all the games from the database, it could be millions of games from LeageB and only five from LeageA.

Long story short, sometimes i quite open for dangerous criteria like: GET /rest/game?cayenneExp={ "exp" : "league = $league", "params":{"league": "LeagueA"}}&dangerousLimit=10&dangerousStart=20

What I'm expecting from that dangerous[Limit/Start] is that cayenne is forced to load exactly that set of data and send exactly that range to the database. But I as a developer MUST respect all security filtration as a filter as well. It's NOT a replacement for the security layer.