filecoin-project / Allocator-Governance

7 stars 34 forks source link

1st Community Review of Public Open Dataset Pathway #193

Open martplo opened 1 week ago

martplo commented 1 week ago

Application: v5 Notary Allocator Application: Open Public Dataset Pathway Latest compliance report: Compliance Report - 2024-10-14 01:22:58

List of clients (chronologically):

  1. web3eye.io After the initial 50TiB was granted, we closed the application due to the client's inactivity. Total DC granted: 50TiB

  2. web3.storage The client asked for 2PiB, which was granted. Total DC granted: 2PiB

  3. K12 International Education Platform After the initial 50TiB, issues were pointed out to the client, who agreed to improve. After another test 50TiB allocation, the client was proven not diligent, and the application was closed. Total DC granted: 100TiB

  4. zinc15 After the initial allocation of 50TiB, we asked the customer additional questions about the dataset due to the introduction of enhanced data verification policies. The customer never responded to the questions and was uncooperative, so the application was not renewed. Total DC granted: 50TiB

  5. VshareCloud After the initial 50TiB, a discussion with the client was raised, where we established that this client didn't meet the criteria for the open pathway. After further clarification, we decided not to extend the cooperation with complete understanding from the client. Total DC granted: 50TiB

  6. Common Crawl This is an active client we are working with. Total DC granted: 250TiB

  7. OpendataLab This is an active client we are working with. Total DC granted: 2.5PiB

For each client, a thorough analysis is carried out and additional questions are asked to clarify inconsistencies in the application and to detail the data preparation process, explaining the level of customer orientation. Each client goes through the KYC process using kyc.allocator.tech. Non-compliant applications are rejected and closed (a few examples: 1, 2, 3) After positive verification, clients receive allocations according to the allocator application (5%, 15%, 30%, 50% of DC). During cooperation with clients, the retrieval rate is regularly checked by conducting CID reports in which the locations of SPs are analyzed on an ongoing basis, the list of SPs used by the client is compared with the list of SPs provided in the form, and any discrepancies are clarified with the clients on an ongoing basis.

filecoin-watchdog commented 6 hours ago

@martplo Allocator Application Compliance Report

4.95 PiB granted to clients: Client Name DC
webeye.io 50TiB
zinc15 50 TiB
VshareCloud 50 TiB
K12 International Education Platform 100 TiB
w3s and the users of our tools 2PiB
CommonCrawl 250TiB
OpenDataLab 2.5PiB

Example 1 - webeye.io KYB and KYC was performed. This looks like one of the older applications, as it might be tracked back to fil+ large datasets https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2328 I assume this is the same application. If this was a known client, though, why didn’t the Allocator follow their own rules and didn’t grant an initial 5%, but started with 50TiBs? The application was closed due to no response from the client.

Example 2 - zinc15 This application seems to be stored several times on the filecoin: https://github.com/search?q=repo%3Afilecoin-project%2Ffilecoin-plus-large-datasets+zinc3d&type=issues Has the allocator clarified this with the client before starting cooperating with them? KYC was performed, and additional questions were asked.

Example 3 - VshareCloud A lot of additional questions were asked before first allocation was granted. KYC was performed. The application was closed after the initial 50TiB because it was determined that the client did not fit into this allocator. There is no CID report to analyse.

Example 4 - K12 International Education Platform The client requested 5 PiB and declared 4 data replicas of 512TiB each. With this dataset size, 2 PiB of data should be enough, but the allocator didn’t raise it. The first allocation of 50TiB showed that the user was not compliant with the rules. Why was another 50 TiB granted? What would that prove? Also, it looks like this dataset was stored on the filecoin before and the same client did it: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2283

Example 5 - w3s and the users of our tools The client requested 2 PiB and declared 10 data replicas of 600TiB each. With this dataset size, a minimum of 6 PiB of data should be requested, but the allocator didn’t raise it.

Also, search for data sample link in the large dataset repo returns 5 records with very similar applications: https://github.com/search?q=repo%3Afilecoin-project%2Ffilecoin-plus-large-datasets+bafybeid5jpdqzlb4tqsd6peoa7qstoxat3ovsg62wutyp4gnzqbqsggfsq&type=issues

KYC was performed, yet no additional questions were asked—no questions on data preparation or the SPs list.

Most SPs have good retrieval rates, yet 2 out of 8 SPs have them below 1%. Also, CID sharing occurred, and the allocator didn’t bring that up.

Example 6 - CommonCrawl The allocator asked many additional questions. The process of resolving the application details seems very thorough.

In his application, the allocator said the DC allocation process would be 5/15/30/50%. However, in this client's case, the first three allocations are 50TiB, 50TiB, and 150TiB, which do not fit the declared rules. Where did these changes come from?

2 out of 9 SPs have a retrieval rate of 0%, 1 has less than 75%, and the rest look good.

Example 7 - OpenDataLab The allocator asked additional questions, clarified inconsistencies on an ongoing basis, and conducted frequent reports.

3 out of 8 SPs have a retrieval rate below 75%.


Overall, this allocator asks many questions, conducts thorough customer analysis, and runs regular CID reports.

martplo commented 2 hours ago

Thank you for the review.