Open Carohere opened 1 month ago
Allocator Application Compliance Report 1st Review
1st Review score: 5PiB granted
4.75 PiB granted to existing clients: | Existing Client Name | DC |
---|---|---|
ESGFand Pangeo | 4 PiB | |
NationalLibrary of Medicine | 0.75 PiB |
Example 1 - issue 8 The client requested 5 PiB and declared 5 data replicas of 1.8PiB each. With this dataset size, a minimum of 9 PiB of data should be requested. The allocator didn’t raise it.
This dataset has already appeared multiple times in filecoin. The allocator should have checked and clarified these issues before granting the first allocation: https://github.com/search?q=repo%3Afilecoin-project%2Ffilecoin-plus-large-datasets+encode-public&type=issues
SPs provided: f02128256 Toronto f02196792 HongKong f02029743 Singapore f02363999 Singapore f02368314 Deyang
SPs updated f03159626 Dulles f03144188 Dulles f03157879 Los Angeles f03089826 Dulles f03106356 Dulles
SPs used for deals (22): f03100009 f03100008 f03216485 f03030649 f01084941 f01082888 f01975299 f02828269 f01660795 f0114153 f01084413 f03157879 f03214920 f03192503 f03145504 f03159626 f03144188 f03089826 f03106356 f03156722 f02825282 f02826815
The first report showed 14 SPs, and the last 22 SPs. The first report showed 9 additional addresses the client did not inform about; in the last report, there were as many as 17 SPs (also not updated in the issue). The client declared 5 replicas, while at the moment, there are already 21. With such a large number of SPs, the retrieval rate is difficult to assess. However, only 8 out of 22 SPs meet the condition of retrievability >75%.
Example 2 - issue 6 This dataset was already stored on filecoin. Did the allocator point that out to the client? https://github.com/search?q=repo%3Afilecoin-project%2Ffilecoin-plus-large-datasets+sra-pub-sars-cov2&type=issues
SPs provided mostly match the updated SPs list. Only 3 out of 10 SPs has a retrieval rate >75%.
The client declared 4 replicas, while at the moment, there are already 10.
@Carohere Hi there. I wanted to remind you that the review was done.
@filecoin-watchdog Hi. Thanks for your diligence :)
Should I intervene in the total amount of applications from clients now? My pathway decided whether or not to continue allocating datacap to a client based on the client's cid report, which depends on my research and not the total amount from the client's application.
Can you tell me where there is the rule that requires the client must not store the same dataset? Really appreciate it.
https://check.allocator.tech/report/Carohere/Caro-Allocator/issues/8/1729515905945.md 87% of the sps this client has worked with have had effective retrieval rates, I think he's done a great job. The client informed me on slack that he added the new sp.
https://check.allocator.tech/report/Carohere/Caro-Allocator/issues/6/1729520522541.md This client's retrieval success rate is on the decrease compared to the previous one, so I've reduced the allocation for this client and have reminded him.
https://github.com/Carohere/Caro-Allocator/issues/6#issuecomment-2408361274
Since the client has added sps before, the new sps and the old sps will exist in the report at the same time in the latest report.
Our common goal is (or at least should be) to build a diverse database of helpful information. Duplicating the same data does not positively affect this intention. Therefore, it is worth asking the user why they want to store the same set again. In the application form, there is a point:
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network.
And this is where the user should justify the reasonableness of their action. I'm not saying there is no reason to duplicate data, but the user should know why they are doing it and be able to justify it. We want the community to cooperate in creating a valuable database of essential data.
@filecoin-watchdog Got it. I'll keep an eye on this part next. Thanks!
Appreciate the dialogue and communication back and forth. Agreeing with the watchdog, the goal is to increase the diversity of data on the network overall. And as flagged, there are specific questions regarding duplicate datasets. This is part of the expectation on allocators, to verify whether this specific client, data, and SP deal-making is adding new power to the network, rather than centralizing or repeating.
Overall, we would like to see increased diligence regarding SP distribution lists, number of replicas per client as well as per dataset across the ecosystem, and increasing retrieval success. We will request an additional 5PiB of DataCap to support this individual pathway.
Compliance Report: https://compliance.allocator.tech/report/f03018489/1728867282/report.md
Datacap allocate to: https://github.com/Carohere/Caro-Allocator/issues/6 https://check.allocator.tech/report/Carohere/Caro-Allocator/issues/6/1728702752269.md This client has provided latest sp list. They have focused on retrieval result so that they add new sps who support retrieval in time. Current retrieval success rate awaiting further update.
https://github.com/Carohere/Caro-Allocator/issues/8 https://check.allocator.tech/report/Carohere/Caro-Allocator/issues/8/1728659532009.md
Based on the client's report data for two phases, the client's retrieval success rate is increasing. So I continue to allocate datacap to our clients after confirming that things are going well.