These are the things we need to accomplish to get Estuary to the alpha and post alpha stage. I’d like to look at each as Pillars with each being built and should perfectly levelled to stablize the Estuary platform.
This is all Tech. No productization / product lifecycle steps here.
[ ] Write an SQL script to remove the majority of the non-active pins (14m plus records) on shuttle-4, e.i Delete all non-active pins.
[ ] The negative impact of removing these records is that if some of the pins are on the blockstore then anyone who uses the /gw will fail to look up the CID since this gateway relies on the database record.
[ ] There might be some failed pins that are yet to be processed by the SP so lost of opportunity there.
[ ] Another solution is to create a clean up script on shuttle-4 to traverse thru the blockstore using the CIDs from the pins table, identify those that the shuttle can't "walk" - meaning it's in the database but not on the blockstore (using merkledag.Walk), and delete them on the database. It will be like a "estuary shuttle reconciler" tool to match the blockstore CID with the pins table.
[ ] Write scripts that can perform backups on specific filters.
[ ] Write SQL script to delete the CIDs that doesn’t exist on the blockstore of the local node.
Debugging
[ ] Enable developers that they have the proper debugging tools (GoLand).
[ ] Set up dedicated shuttles for each developer (for dev testing)
[ ] Enable pprof on all shuttles and api node
[ ] Enable grafana agents
Functional
[ ] Revisit the pinning mechanism
[ ] We need to revisit the pinning process, specifically the infinite loops and initialization of workers to pin specific content. The current process right now is causing a build of unnecessary memory allocation on the PinningOperation which contributes to the OOM issue.
[ ] I’d like to explore the possibility of separating the queuing from the main api node. We had discussions on this before and I would like to revisit.
[ ] Revisit all the infinite for loops and check if we need to create intervals or optimize them.
[ ] Unit Tests (Quality Assurance) - there is a unit-tests branch that has placeholder of unit tests source files in go. I know it’s not the best thing to do so I think we should just collectively, slowly and piece by piece put up a “chore” commit to clean up and create unit tests as we go.
Estuary Stability
Overview
These are the things we need to accomplish to get Estuary to the alpha and post alpha stage. I’d like to look at each as
Pillars
with each being built and should perfectly levelled to stablize the Estuary platform.This is all Tech. No productization / product lifecycle steps here.
Github Project: https://github.com/orgs/application-research/projects/7/views/5
System Errors (Panics)
All on it’s own page. We need to handle all the panics.
Log file:
log_file_from_shuttle6
msg":"couldnt decode pid
pinning queue error: context canceled\nfailed to walk DAG\nmain.
failed to handle rpc command: Unable to send restart request: exhausted 5 attempts but failed to open stream to
pinning queue error: context deadline exceeded\nfallback provide failed\nmain
tried to add pin for content we failed to pin previously
failed to handle rpc command: failed to compute commP
failed to handle rpc command
Infrastructure
Data Clean up
https://filecoinproject.slack.com/archives/C016APFREQK/p1660258369066179
pins
(14m plus records) on shuttle-4, e.i Delete all non-active pins./gw
will fail to look up the CID since this gateway relies on the database record.pins
table, identify those that the shuttle can't "walk" - meaning it's in the database but not on the blockstore (using merkledag.Walk), and delete them on the database. It will be like a "estuary shuttle reconciler" tool to match the blockstore CID with the pins table.Debugging
Functional
shuttle.go
handleShuttleMessages
autoretrieve.go
shuttle/main.go
RunRpcConnection
websocket connection handleRpcCmd
websocket.JSON.Send
addDatabaseTrackingToContent
handlers.go
addDatabaseTrackingContent (duplicate code)
websocket.JSON (duplicate code)
handleShuttleConnection
pinmgr.go
Run(workers int)
replication.go
runStagingBucketWorker
runDealWorker
trackbs.go
benchtest/main.go
AutoRetrieve
AR
[ ] Unit Tests (Quality Assurance) - there is a unit-tests branch that has placeholder of unit tests source files in go. I know it’s not the best thing to do so I think we should just collectively, slowly and piece by piece put up a “chore” commit to clean up and create unit tests as we go.
[ ] Automated / Regression Tests - we should at least run the shell or postman jobs to run the API endpoint tests.
Functional Improvements
Proposal: Collections API V2
Proposal: Directory API
Proposal: API Versioning for Estuary
Proposal: Proxy-Forwarder
Proposal: API Gateway
Support
Refactor / Rearchitecture