Is your feature request related to a problem? Please describe.
We are thinking about altda->ethda failover scenarios, where if altda is down for whatever reason, the op-batcher could just switch back to ethda for some period of time. Starting this issue to describe potential approaches to achieving this failover, which would require changes to the altda server API, which currently only describes happy path (200) responses.
503 error code
Our current thinking is to keep the op-batcher implementation simple, and push the decision as to when to failover into the altda-server. This way different teams can then experiment with different approaches, potentially specific to their own da layer. The op-batcher would simply submit its blobs to the altda-server, and if it ever receives a 503 (service unavailable), it would failover and start submitting frames via ethda. 2 potential ways to decide when to retry submitting to altda:
time-based: 503 could return a RETRY_AFTER value, and the op-batcher could start resubmitting to altda after this many seconds
channel-based: channels would have a "da_method" state, and upon receipt of a 503, op-batcher would change the state of the channel the frames belong to, to "da_method: ethda", and then fallback to the current auto logic of deciding between blobs and calldata at submission time.
We have a preference for 2 as it seems cleanest and would (we think) require minimal changes to the current op-batcher implementation. We will submit a PR shortly, but would appreciate any comments/criticisms of this approach that could help improve this in any way.
201 error code
We would also like to support another error code, which is useful for ensuring fallback storage has been written to. See the discussion here for full details, but to summarize, some rollups want for max assurance to make sure that the blobs are not only written to eigenDA, but also to a secondary "fallback" storage like S3, such they have 100% guarantee that they can retrieve the blob at a later time.
Basically this means that a POST to /put/ returns 200 iif it writes to both main (eigenda) and secondary/fallback (s3) storage. If s3 is down for whatever reason or the write fails, then we return a 201 to ask the client to retry.
openapi: 3.0.0
info:
title: OP AltDA Server API
version: 1.0.0
description: API for storing and retrieving preimages with hex-encoded commitments (see https://specs.optimism.io/experimental/alt-da.html for more details)
paths:
/put:
post:
summary: Store a preimage on a blockchain based DA layer and get a hex-encoded commitment.
description: >
Because commitments can include the block height, hash or depend on onchain data, the commitment cannot be computed prior to submitting it to the DA Layer.
If using a simple commitment scheme, use the /put/<hex_encoded_commitment> route instead.
requestBody:
required: true
content:
application/octet-stream:
schema:
type: string
format: binary
responses:
'200':
description: Successful operation - written to both main and (optionally) secondary storages
content:
application/octet-stream:
schema:
type: string
format: binary
'201':
description: Partially successful operation - written to main storage only. Client should resend the same request to make sure it is successfully written to needed fallback storages.
content:
application/json:
schema:
type: object
properties:
commitment:
type: string
description: Hex-encoded commitment
status:
type: string
enum: [partial]
message:
type: string
'500':
$ref: '#/components/responses/InternalServerError'
'503':
$ref: '#/components/responses/ServiceUnavailable'
/put/{hex_encoded_commitment}:
post:
summary: Store a preimage with a pre-computed hex-encoded commitment on a content addressable storage layer like IPFS or any S3 compatible storage
parameters:
- in: path
name: hex_encoded_commitment
required: true
schema:
type: string
description: Hex-encoded commitment for the preimage
requestBody:
required: true
content:
application/octet-stream:
schema:
type: string
format: binary
responses:
'200':
description: Successful operation
'400':
description: Bad request - if the provided commitment doesn't match the preimage
'500':
$ref: '#/components/responses/InternalServerError'
'503':
$ref: '#/components/responses/ServiceUnavailable'
/get/{hex_encoded_commitment}:
get:
summary: Retrieve a preimage by its hex-encoded commitment
parameters:
- in: path
name: hex_encoded_commitment
required: true
schema:
type: string
description: Hex-encoded commitment of the preimage to retrieve
responses:
'200':
description: Successful operation
content:
application/octet-stream:
schema:
type: string
format: binary
'404':
description: Not found - if the commitment doesn't exist
'500':
$ref: '#/components/responses/InternalServerError'
components:
responses:
ServiceUnavailable:
description: >
Service unavailable. When received, clients should fallback and submit their blobs to Ethereum to be safe.
They can try resubmitting blobs to altda via this server after <retry_after> seconds, if present.
content:
application/json:
schema:
type: object
properties:
error:
type: string
retry_after:
type: integer
description: Seconds until client should retry. This field is optional.
required:
- error
InternalServerError:
description: >
Internal Server Error. This indicates a problem with the current (proxy) server.
The client should consider this request as failed and may retry immediately with the same or a different server instance.
content:
application/json:
schema:
type: object
properties:
error:
type: string
description: A message providing more details about the error.
required:
- error
Thinking through this might also add a 400 error code (for eg if blob submitted is too large) to let op-batcher know that its settings are most probably wrong.
Is your feature request related to a problem? Please describe.
We are thinking about altda->ethda failover scenarios, where if altda is down for whatever reason, the op-batcher could just switch back to ethda for some period of time. Starting this issue to describe potential approaches to achieving this failover, which would require changes to the altda server API, which currently only describes happy path (200) responses.
503 error code
Our current thinking is to keep the op-batcher implementation simple, and push the decision as to when to failover into the altda-server. This way different teams can then experiment with different approaches, potentially specific to their own da layer. The op-batcher would simply submit its blobs to the altda-server, and if it ever receives a 503 (service unavailable), it would failover and start submitting frames via ethda. 2 potential ways to decide when to retry submitting to altda:
RETRY_AFTER
value, and the op-batcher could start resubmitting to altda after this many secondsauto
logic of deciding between blobs and calldata at submission time.We have a preference for 2 as it seems cleanest and would (we think) require minimal changes to the current op-batcher implementation. We will submit a PR shortly, but would appreciate any comments/criticisms of this approach that could help improve this in any way.
201 error code
We would also like to support another error code, which is useful for ensuring fallback storage has been written to. See the discussion here for full details, but to summarize, some rollups want for max assurance to make sure that the blobs are not only written to eigenDA, but also to a secondary "fallback" storage like S3, such they have 100% guarantee that they can retrieve the blob at a later time.
Basically this means that a POST to /put/ returns 200 iif it writes to both main (eigenda) and secondary/fallback (s3) storage. If s3 is down for whatever reason or the write fails, then we return a 201 to ask the client to retry.
Describe the solution you'd like
Created an openapi spec to precisely describe what we would need: https://app.swaggerhub.com/apis/SAMLAF92/op_altda_server/1.0.0
Describe alternatives you've considered
Additional context