Algod endpoint to GET boxes-with-prefix

SilentRhetoric commented 3 weeks ago

Problem

When on-chain applications have many boxes, it can be difficult for an off-chain application to find a specific set of related boxes from which to read data. A typical use case is a smart contract that stores data in boxes keyed to addresses representing end users. In this case, retrieving all boxes associated with that user account is awkward using existing REST endpoints.

The following example helps to illustrate the problem:

Application 123456789 stores user data in boxes, with one box per user-thing combination, with each box prefixed by the user address: USERADDRESSOAVQGGURJ2EGNZYZZDPEQ37CHEFLLIAFYTCVLP7UZPSV3MEthing1 USERADDRESSOAVQGGURJ2EGNZYZZDPEQ37CHEFLLIAFYTCVLP7UZPSV3MEthing2 USERADDRESSOAVQGGURJ2EGNZYZZDPEQ37CHEFLLIAFYTCVLP7UZPSV3MEthing... USERADDRESSOAVQGGURJ2EGNZYZZDPEQ37CHEFLLIAFYTCVLP7UZPSV3MEthingN Assume that this application has 100,000 users (or more), each with some number of things stored in individual boxes.

When a user visits the dapp front end, the dapp must load the user's balance and settings information from the data boxes on chain. Using algod's two existing box data REST API endpoints this presents a challenge for the dapp developer:

/v2/applications/{application-id}/boxes - This would return a very large list of box names, paginated into chunks of perhaps 10,000 box names (depending on the configuration of the algod node). To find all boxes related to the user, the dapp may need to page through 10 or more responses to assemble a full list that could be filtered for only boxes it needs for the connected user.
/v2/applications/{application-id}/box - This only works if the exact box name is known, and the dapp may not know when the user connects which boxes exist onchain. It is first necessary to query the smart contract's boxes to find specific boxes related to this user.

Solution

An algod REST endpoint to enable fetching all boxes for a given application for which the box name contains a given prefix. The endpoint could be /v2/applications/{application-id}/boxes-with-prefix/{prefix}.

A specific design compromise socialized in the #general-dev channel on Discord and found to be acceptable is that this endpoint would return data from the algod database on desk and thus be a small number of rounds behind the current round of the node. This compromise, suggested by @jannotti, limits development effort to build a solution that can sort box keys and paginate responses based on that sorting while algod still holds the most recent few rounds of data in memory and which it has not yet written to the DB on disk.

An alternative approach involving combining data from the DB with state data still in memory would be significantly more expensive to build. For this reason, the expected solution sought at this time is one that responds with boxes from the DB and is understood to be perhaps 10-20 seconds behind the current state of the chain.

It is also relevant to mention that the solution would resemble the recently-added endpoint /v2/accounts/{address}/assets, which takes a similar approach insofar as the data used to respond to the request is taken from the DB and may not contain any assets touched in the last few rounds.

Additionally, @jannotti 's work on https://github.com/jannotti/go-algorand/tree/api-box-paging is related to building a solution for this Issue.

A draft OAS3 specification for the desired endpoint has been included at the bottom of this issue.

Dependencies

None.

Urgency

A solution to this problem is of high urgency for developers who are embracing box storage for their smart contract applications at scale but finding it challenging to build performant dapps that can read that onchain data once it begins to scale.

Example oas3.json spec:


    "/v2/applications/{application-id}/boxes-with-prefix/{prefix}": {
      "get": {
        "description": "Given an application ID, return all Box names beginning with the provided prefix. The request fails when client or server-side configured limits prevent returning all matching Box names.",
        "operationId": "GetApplicationBoxesWithPrefix",
        "parameters": [
          {
            "description": "An application identifier",
            "in": "path",
            "name": "application-id",
            "required": true,
            "schema": {
              "type": "integer"
            }
          },
          {
            "description": "Max number of matching box names to return. If max is not set, or max == 0, returns all matching box-names.",
            "in": "query",
            "name": "max",
            "schema": {
              "type": "integer"
            }
          },
          {
            "description": "A box name prefix, in the goal app call arg form 'encoding:value'. For ints, use the form 'int:1234'. For raw bytes, use the form 'b64:A=='. For printable strings, use the form 'str:hello'. For addresses, use the form 'addr:XYZ...'.",
            "in": "query",
            "name": "prefix",
            "schema": {
              "type": "string"
            }
          },
          {
            "description": "A box name, in the goal app call arg form 'encoding:value'. When provided, even if blank, the returned box names will be the names lexicographically following the provided name in sorted order. Further, the call will not fail if there are more box names following the maximum return limit. Callers may implement pagination by reinvoking the endpoint with the last box name returned. Call will fail if the provided max exceeds the algod configured max, to prevent ambiguity of short returns.",
            "in": "query",
            "name": "after",
            "schema": {
              "type": "string"
            }
          }
        ],
        "responses": {
          "200": {
            "content": {
              "application/json": {
                "schema": {
                  "properties": {
                    "boxes": {
                      "items": {
                        "$ref": "#/components/schemas/BoxDescriptor"
                      },
                      "type": "array"
                    }
                  },
                  "required": [
                    "boxes"
                  ],
                  "type": "object"
                }
              }
            },
            "description": "Box names of an application"
          },
          "400": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/ErrorResponse"
                }
              }
            },
            "description": "Bad Request"
          },
          "401": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/ErrorResponse"
                }
              }
            },
            "description": "Invalid API Token"
          },
          "500": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/ErrorResponse"
                }
              }
            },
            "description": "Internal Error"
          },
          "default": {
            "content": {},
            "description": "Unknown Error"
          }
        },
        "summary": "Get all box names with a given prefix for a given application.",
        "tags": [
          "public",
          "nonparticipating"
        ]
      }
    },

pbennett commented 3 weeks ago

This will be nice to have. Given that there are some APIs where the 'limit' isn't really known and is set by the node operator, do we define a maximum upfront so there's at least some constraint we can code to? An alternative is algod providing some type of constraints endpoint to provide per node restrictions, but since callers could be talking to nodes with different parameters (intentional or not) that could be problematic as well.

Along that line, perhaps related to jj's branch but this definitely seems more appropriate for a paging interface like the assets fetch. Then the max returned is up to the node. You get X amount with pagination token to use in fetching subsequent results.

PhearZero commented 3 weeks ago

Great work @SilentRhetoric! Good points @pbennett, I always forget algod has the account assets endpoint paginated and seems like a logical fit for application boxes

Small nit: It would be nice to introduce it as /v2/applications/{applicationId}/boxes?max={max}&prefix={prefix}

anwaar001 commented 3 weeks ago

@SilentRhetoric thanks for raising this issue, i am facing the some problem, having this functionality in algod endpoints will help developers to easily create and scale large application on algorand blockchain

pbennett commented 3 weeks ago

Having ability to specify a 'list' of boxes to fetch would be nice as well - a 'batch' fetch call basically.

emg110 commented 2 weeks ago

This is a very nice idea and I completely agree on requirement! Wouldn't it be nicer if we just add filter paths to existing endpoints and not add a new one explicitly? This is how it is with existing endpoints and if URL params are available they get used as filters (prefixes in this case and we might add more in the future, such as length or...). @SilentRhetoric

pbennett commented 1 week ago

Only additional comment is that it should be like the other paged apis. response would include next-token, and instead of 'after', it should be 'next'. This would be inline w/ the existing algod and indexer APIs that return paged results.

algorand / go-algorand