hannahhoward commented 2 years ago

What

In addition to metrics, deployments of autoretrieve may want to record and track more specific data about each retrieval, for more detailed querying or diagnostics.

With this in mind, I'd like to propose that an autoretrieve deployment can be setup with a "publish URL". The URL must point to a service that runs a REST API that auto retrieve interacts with to publish stats about its retrievals.

the proposed resources are:

RetrievalAttempt
{
   id: UUID
   cid: CID
   stage: string
   errorMessage: string
   autoretrieveInstance: string
   logs: []string
   startedAt: datetime
}

Endpoint:

PUT /retrieval_attempt/~uuid~
JSON Body: RetrievalAttempt
Success: 200 OK
Fail: 400 Bad Request

---

ProviderRetrieval
{
   peerID: peerID
   retrievalUUID: UUID
   stage: string
   errorMessage: string
   logs: []string
   startedAt: datetime
}

Endpoints:

PUT  /retrieval_attempts/~uuid~/providers/~peerID
JSON Body: ProviderQueryAsk
Success: 200 OK
Fail: 400 Bad Request

Tracking in auto retrieve:

UUID assigned by Autoretrieve.
in memory by UUID, only for active attempts imagine something like:

type internalRetrievalAttempt struct {
   RetrievalAttempt
   lk sync.Mutex
   providers map[peer.ID]*ProviderRetrieval
}

// existing filecoin Retriever struct
type Retriever struct {
   // ... existing fields
   retrievalStatesLk sync.RWMutex
   retrievalStates map[UUID]*internalRetrievalAttempt
}

willscott commented 2 years ago

notes from sync:

probably don't need post/put/get - can just post logs and hve autoretrieve manage uuid.
do we need all 3? can probably provide query ask and provider deal
probably can combine provider_deals and provider_query_ask

hannahhoward commented 2 years ago

updated based on feedback in external meeting

application-research / autoretrieve

Autoretrieve Publish API #96

What