filecoin-project / notary-governance

114 stars 58 forks source link

Modification: AC Bot - Thresholds to Enforce Compliance #986

Closed ghost closed 3 months ago

ghost commented 10 months ago

This is a continuation from the earlier initial proposal in #976 concerning the implementation of the Aggregation and Compliance bot. This bot is designed to compile data from different data sources and enforce guidelines. Based on the tests we have conducted, this proposal outlines more refined thresholds for the AC bot to make determinations on DataCap removal

Test process

To simulate the AC bot's effect on the Fil+ process, we assessed how each client performed against each metric highlighted in the previous proposal. We then visualized the data using histograms to display the distribution of client scores across each metric. Based on insights from these histograms, we started an internal Fil+ governance team discussion to determine the appropriate thresholds. The subsequent sections outline the finalized thresholds, the resulting data in that sequence.

Thresholds

Here are the proposed thresholds schedules for DataCap removal that will begin once the bot is deployed and will gradually be tightened over a 8-week period. This incremental approach will allow clients to adapt to the chaning compliance standards. Failing these thresholds will result in automatic creation of a DataCap removal proposal.

metric Week 1-2 Week 3-4 Week 5-6 Week 7-8
CID-checker score >25% >50% >75% >95%
Retrieval Bot score 0% >10% >25% >75%
Claimed SP count >0 >0 >0 >5
Actual SP count after fourth allocation >5 >5 >5 >5
percent of actual SPs in claimed list >0% >0% >0% >100%
KYC Check 0 0 0 1
SP unique locations 1 1 3 4
Percent of properly replicated data >0% >0% >25% >75%
Max percent data stored by top provider <75% <50% <40% <35%
Shared data percent <20% <5% 0% 0%

For the metrics "Claimed SP Count," "Claimed SP Count," and "Actual SP Count," we currently lack sufficient data to enforce immediate thresholds. As a result, we suggest a 6-week data collection phase to gather the necessary insights. More about this is described in the "Test Results and Rationale" section.

Next steps and call to arms

If there are no objections and we obtain the required support for implementing the AC bot with these thresholds, we will proceed with GitHub integration and deployment. Please note that the capability for DataCap removal is contingent upon the functionality being developed in line with this proposal

If you support this implementation please indicate so in the comment section below. For those who believe there should be ammendments to the proposal, we welcome your input in comments as well. Additionally, you're welcome to reach out to any member of the Fil+ team or directly to @philippe pangestu on Slack for further discussions.

Test Results and Rationale

The testing phase involved data analysis of 229 client addresses with active GitHub applications, as well as the subset of 116 clients who have initiated their applications within the past three months. The following graphs describe how client addresses have scored against each of the metrics:

CID-checker score

Screen Shot 2023-11-02 at 3 00 06 PM

Retrievability bot score

Screen Shot 2023-11-02 at 3 05 04 PM

Claimed SP count

Screen Shot 2023-11-02 at 3 06 15 PM

Actual SP count after fourth allocation

Screen Shot 2023-11-02 at 3 07 06 PM

percent of actual SPs in claimed list

Screen Shot 2023-11-02 at 3 07 43 PM

KYC check

Screen Shot 2023-11-02 at 3 08 08 PM

SP unique locations

Screen Shot 2023-11-02 at 3 08 48 PM

Percent of properly replicated data

Screen Shot 2023-11-02 at 3 09 05 PM

Max percent data stored by top provider

Screen Shot 2023-11-02 at 3 09 38 PM

Shared data percent

Screen Shot 2023-11-02 at 3 12 21 PM

kernelogic commented 10 months ago

It looks like a good approach and considered actual scenarios. However I just have one question, how to understand this?:

Shared data percent <20%

Share with what? Other LDNs from other people? Or the same dataset from a series of LDNs? How to reduce it to 0% after it's there? By terminating sectors?

As we know same dataset prepared by different DPs using the same software, e.g. Singularity, can possibly produce same cars.

spaceT9 commented 10 months ago

Great proposal, more permissionless!

nicelove666 commented 10 months ago

Agree

nicelove666 commented 10 months ago

Currently the bot is deploying a new version, the bot stopped working, it seems like it's been 2 weeks, looking forward to the bot getting back to growing.

cryptoAmandaL commented 10 months ago

Currently the bot is deploying a new version, the bot stopped working, it seems like it's been 2 weeks, looking forward to the bot getting back to growing.

So, has the issue with the bot been resolved?

nicelove666 commented 10 months ago

目前机器人正在部署新版本,机器人停止工作了,好像已经两周了,期待机器人恢复生长。

那么,机器人的问题解决了吗?

No, the bot is still not working. It seems that the ac robot needs to be online to resume operations.