Open jcace opened 1 year ago
Here's my idea for a retrieval metric: For every SP
reputation = (total_retrievals + successful_retrievals) * successful_retrievals/total_retrievals
(which is the same as reputation = (total_retrievals + successful_retrievals) * retrieval_success_rate
final_score = reputation_score*luck
(luck is a random number between 0.001 and 1)Reasoning behind each step above:
Variables we could tinker with:
edit: normalizing we get reputation = ((total_retrievals + successful_retrievals) * retrieval_success_rate) / 2*t
. Then we can multiply that by 100 to get a reputation between 1 and 100
30 days to start seems good for historic data, this is awesome gabe
calculate reputation score: reputation = (total_retrievals + successful_retrievals) successful_retrievals/total_retrievals (which is the same as reputation = (total_retrievals + successful_retrievals) retrieval_success_rate
Would it be possible to normalize
the reputation to be a number between 0 and 100 ? That way, an SP with a perfect retrieval record and 30+ days of uptime (for example) would have a 100. Easy to reason through at a glance vs. unbounded numbers that might get very big
I'm thinking along the lines of what Saturn does with their weight/bias score (see: https://orchestrator.strn.pl/stats - far right column). It's a number between 1-100 and if you have a 100, you know your node is operating perfectly and will get chosen for the Saturn retrievals.
Luck range (0.001 to 1 in the above example)
I like this luck
concept since allow for some churn. If we normalize the rep. score, then it could be a random multiplier from 0.0 - 1.0, ex - 0.5 - 1.0 - this would mean an SP with a score of 51 has a chance of competing with a perfect score.
@jcace I don't think it'll get too big since we're getting 30days only buuuut why not let's do it.
I think we should roll this out in two phase to provide a "soft entry", and allow retrieval issues to get ironed out before they start causing SP's to miss out on deals:
First phase -> record the stats, make them publicly visible and accessible to SP's, but don't act on them yet. Run it like that for 2-4 weeks or so to build up some statistics and get a feel for the network. Also help with troubleshooting any retrieval issues that exist.
Then, second phase -> turn on reputation-based dealmaking
Bedrock has done some work on retrieval-based reputation systems:
https://www.notion.so/pl-strflt/3-Retrieval-Reputation-Dashboard-a0e7a289b49d4a8eb63aa21c0028bab7 https://www.notion.so/pl-strflt/Retrieval-Metrics-Schema-Discussion-3dcd75f2f0e249d7a098fc1f784cc122
Initially, our reputation score should strictly be based on a "yes/no" retrievability score, that's based on whether SPs are serving retrievals or not.
Longer term, it will be important for us to define what a "good Estuary retrieval" looks like. What behaviours are we interested in? ex)
Optimizing each of these parameters changes the type of retrieval we're incentivizing. For instance, Xfer speed > TTFB if we're serving large files (ex, video), whereas TTFB is more important if we're serving many small files (ex, static websites)
Saturn's incentive approach prioritized TTFB - read more about it here https://hackmd.io/@cryptoecon/saturn-aliens/%2FYOuJDLUUQieYfpEcAYSCfQ
Proposal: Estuary Reputation System
This is a WIP
Problem Statement
Estuary currently selects SPs at random when making deals. We should build a reputation system that ranks/directs deals towards SPs that perform in a way that is advantageous for our network. We will use this issue to discuss the inputs/calculations for such a reputation system.
Currently, the most important metric we should be concerned with is retrieval performance
Estuary currently does not provide any incentives for Storage Providers to serve up CIDs that we deal to them. This is problematic, as autoretrieve relies on SP's serving up content to work properly. Without retrievals working, it is risky to offload content from our shuttles as it may result in unretrievable files.
Proposed Solution