application-research / autoretrieve

A server to make GraphSync data accessible on IPFS
22 stars 7 forks source link

Extracting retrieval code to a library #143

Closed hannahhoward closed 1 year ago

hannahhoward commented 1 year ago

Goals

This issue is to start a discussion with @elijaharita and the Estuary team. Essentially, the autoretrieve retrieval code is quickly becoming the most battle tested general purpose client to find and retrieve content from Filecoin SPs. Bedrock wants to extract it and use it for other projects. Rather than force people to rewrite the indexer -> SP query -> SP retrieval flow, and then debug and optimize it over and over, we want to have common code anyone can use, and benefit from the Bedrock team's ongoing maintainence and improvement to this code.

In the immediate future, we think this code can be used as is to support Saturn in falling back to SPs for content.

In the not so near future, there's a lot we can do.

For one, we love Filclient, but the minimal Filecoin client is not so minimal any more, and most of its code deals with storage. We'd like to write some well tested minimal baseline code for retrieval from Filecoin via Graphsync + Data transfer.

Also, the Bedrock team manages the retrieval protocols exposed by Boost and we will continue to innovate in this space. Rather forcing everyone to figure out how to update every time we introduce an improvement, we want folks to use a client we provide and then just reap the benefits of ready-made updates.

This brings us back to autoretrieve and the Estuary team's usage. We'd like to understand if the Estuary team would be ok with this approach, and want to come along with us. The benefit here would be massive work off your plate -- you can focus on innovating just the autoretrieve code that works for Estuary , while you'd be able to stop working about triaging and bug fixing individual retrieval problems -- instead you'd have a team of experts the most experience with these protocols in the PL network to do it for you. And of course to be clear you'd still have write/merge access to our repos. The downside is maybe you'd have less code you have direct ownership and final control of, and perhaps that's a dealbreaker. If that's the case, we still want to move forward, but we'll probably just extract the code and use it independently (though we may need to fork autoretrieve itself at some point to so we can iterate on a single set of code).

What Concretely Would Move

cc: @kylehuntsman @rvagg

kylehuntsman commented 1 year ago

Some additional thoughts,

From my understanding, the initial plan is to copy the parts we need from autoretrieve and put them in a new project repo. This effort will probably happen regardless of the decisions made here. Autoretrieve will not necessarily be heavily updated in this effort aside from fixing any bugs we encounter in the process.

Following development of this retrieval client/library, we'd have the opportunity to replace the bulk of autoretrieve with this client library. This is where the decision with Estuary lies that Hannah has mentioned above.