filecoin-project / tornado-deploy

Bedrock autoretrieve deployment scripts
0 stars 1 forks source link

Build a simple database + Rest API for autoretrieve recording #2

Closed hannahhoward closed 2 years ago

hannahhoward commented 2 years ago

Goals

Build a provider for https://github.com/application-research/autoretrieve/issues/96

Implementation

You can see the outlines of an SQL database structure in the API Spec for https://github.com/application-research/autoretrieve/issues/96

I recommend:

Model repo to work off for setting up SQL+HTTP API: Dealbot controller -- https://github.com/filecoin-project/dealbot/tree/main/controller

TBD/Implementer choice: Should we copy the dealbot's pattern of using serialized IPLD for any fields that aren't individually queryable / indexed? This is a nice way to effectively treat SQL like a document database.

I do think we should use bindnode instead of generated schemas though!

willscott commented 2 years ago

i am happy to have the api throw json, and have the sql side service just throw it into dynamo or bigquery or similar.

hannahhoward commented 2 years ago

@willscott should we just make something like this: https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-dynamo-db.html

willscott commented 2 years ago

pretty much, and the rows in dynamo can have dynamic schema as well, so we can just pass through the json from auto-retrieve directly

hannahhoward commented 2 years ago

yea alright I'll write it up that way

hannahhoward commented 2 years ago

seems super easy and avoids writing a lot of new code.

kylehuntsman commented 2 years ago

Just talked to Hannah about this and a couple concerns came up. My first concern is that Dynamo is not the best solution for a persistence store that is going to have data dynamically queried every which way as people want to view the data differently. Dynamo is limited to querying records with a given partition key, or ID, and some minor query-like-abilities on a sort key. General properties are not queryable unless using secondary indexes, in which case we'll be duplicating all of the data. Dynamo is a lot better of a solution when the queries are known ahead of time and do not change too much. My go to for this kind of ask would have been a PostgreSQL table. A relational table makes it much easier to query dynamically.

The other concern, and more or less just something we have to look into, is how infrastructure for this is managed by kubernetes. I have experience using Cloudformation, SAM, and Cloud Development Kit (CDK), but not the kubernetes integration. I don't know if this is a scenario where we can do both, or we must set stuff up via kubernetes, etc. CDK is awesome, but may just not work with our infrastructure management.

willscott commented 2 years ago
kylehuntsman commented 2 years ago

We had another discussion around resources and model design after there was some concern around there being a few separate processes that took place over the course of a "retrieval". We've named these seperate processes a phase. events take place inside of phases. The immediate goal being to determine which providers are failing and succeeding.

API Endpoint: POST /retrieval-events
Content-Type: application/json
Request Body: {
   retrievalId:         UUID | The ID of the retrieval
   cid:               string | A queryable string of the CID being asked for
   storageProviderId: string | The peer ID of the storage provider
   phase:             string | Must be "(indexer related?)", "query", "retrieval" 
   phaseStartTime:    string | ISO 8601 UTC datetime string of the time this phase started
   event:             string | The name of the current event. Must be "connected", "query-asked", "proposed", "accepted", "first-byte-received", "phase-failed", "succeeded"
   eventTime:         string | ISO 8601 UTC datetime string of the time this event took place
   eventDetails:        json | Misc. JSON data related to the given event. TBD.
}
Responses:
   201: retrieval event was saved successfully
   Body: No body

   400: validation error
   Body: {
      error:                     string | The error code
      errorMessage:              string | The error message
      errorDetails: {[key:string]: any} | Any information related to this particular error
   }

   500: server error
   TBD

@hannahhoward @rvagg

Feel free to make suggestions/comments/etc.

rvagg commented 2 years ago

Msg body looks good and matches my understanding of the discussion and what I'm working toward (nearly there).

rvagg commented 2 years ago

We forgot instanceId in here btw, I've added it to the events; we can call it autoretrieveInstanceId if you prefer

rvagg commented 2 years ago

As per my comment @ https://github.com/application-research/autoretrieve/pull/97#issuecomment-1190259407, here's an example line-delimited JSON event log, with 500 events generated by the current version of that code which you should be able to replay against your service instead of spinning up an instance (although that's not hard either): https://gist.github.com/rvagg/a9b941f1b423679659812fff1b9bf0a1