Closed NDLABS-Leo closed 5 months ago
The third point about the solution is mentioned in the initial specification: Best practice for storing large datasets includes ideally, storing it in 3 or more regions, with 4 or more storage provider operators or owners, and having at least 5 replicas of the dataset. No more than one replica should be stored with one SP ID, and if the data cannot leave a particular geographic boundary, then it is expected that replication will still happen across different locations (cities, datacenters, etc.). Each storage provider should not exceed 30% of the total datacap that the client was allocated and the storage provider should have published its public IP address. If you cannot follow these practices due to policy or any other issues, you may explain your case in the application and provide to the community what method you can do instead. These are recommendations and not strict rules that every client must follow.
I've marked the highlighted areas, and the areas are suggestions rather than mandatory requirements. However, if an applicant is told after submitting an application that the previous specification followed is invalid, this may cause them to lose trust in the Filecoin Plus program. The applicant has spent a lot of time and effort preparing the application beforehand, and to be told during the application process that the previous specifications were incorrect will undoubtedly make the previous work futile. These must be synchronized across all channels.
Set clear standards and everyone works on principles such as:
The shared CID must not be signed.
If the search rate is lower than 10%, it must not be signed.
We also have to discuss, when the SP makes a mistake, which circumstances we can give 1-2 chances, which circumstances we cannot give any chances.Developing uniform standards will help resolve the current confusion
@NDLABS-Leo can you share more on this - "Notary due diligence should be based on objective facts with subjective judgment" I see this as conflicting but might not have full understanding of this statement
Issue Description
The original intention of FIL+ was to incentivize SPs to effectively store important human data. However, currently, the majority of LDN applications consist of AWS and similar content of data, which is not frequently accessed by others. The purpose of SPs storing such data is for backup and long-term retention of this "valuable" data.
Most of the applications said there will be a very low retrieval frequency, which reflects the consensus among applicants that these data will not be frequently accessed. However, for the overall development of the Filecoin project, we need to move towards faster retrievals. But not now. This process has brought together people from different parts of the world, each with different standards, resulting in escalating disputes within the community.
Impact
Currently, there are many different opinions within the Filecoin community, with each member holding unique perspectives. However, clear standards have not yet been established in every point that we want to addressed. If these differences continue to escalate, it will have a negative impact on the friendly environment of the Filecoin community, affecting Filecoin's reputation in the industry and the sustainable development of the project. Therefore, we urgently need to engage in peaceful and rational communication, strive for consensus, and establish a set of guidelines that apply to everyone to maintain the harmony and stability of the community and promote the prosperity of Filecoin. Let us work together to create a more prosperous and friendly Filecoin community.
Proposed Solution(s)
What we should do is:
This should be a gradual process. For example, we have already done well with the second point. In the next month, we can set a goal to focus on address the third point, and in the following month, we will address the fourth point as a goal.
The investigation process should not be used as a weapon to pursue personal interests. Everyone's mind should remain clear, and behaviors that are "excessive" and "beyond boundaries" are the reasons for inefficiency.
Proposed Solutions:
Firstly, data openness and basic data volume: Notary need to conduct sampling surveys of the data they store to understand the approximate amount of original data. If significant differences are found, they should leave comments on GitHub and wait for client responses or undergo secondary audits by other notary to resolve the discrepancies.
Data retrievability: This aspect needs further refinement. Considering that the retrieval bot has been in operation for a long time, the Graphsync retrieval rate should be at least 30%. For HTTP retrieval, the current rate should be maintained at 1% or higher. Additionally, targets should be adjusted in accordance with the overall project development. For example, starting from August, the Graphsync retrieval rate should reach 40%, from September, 50%, and from October, 60%. The same applies to HTTP. However, considering the practical situation of SP hardware facilities, we need to acknowledge the challenge of achieving a 100% retrieval rate. Perhaps setting the highest target at 80% would still be an excellent achievement. Furthermore, regarding HTTP retrieval, it is encouraged to promote its adoption. While SPs can choose to provide Graphsync, HTTP, or Bitswap, it is recommended that HTTP retrieval be universally supported as it will benefit the development of ecosystem projects.
Data distribution and backup: Client applications should ultimately store data in at least three or more regions to ensure data security. Additionally, there should be no fewer than four backups (backup details can be checked from each round's checkbot report), which is crucial for data safety. As for VPN usage, as observed, the community should consider acceptable situations for VPN usage. If VPN is solely used to improve network efficiency without altering geographical location, I believe, from my perspective, it can be acceptable. Further discussions among community members are encouraged on this matter. Moreover, tools to assist notary should be provided during the auditing process.
Notary should not sign LDNs related to themselves to prevent the abuse of their authority.
CID sharing issues: If disputes arise over data sharing content, clients need to provide explanations, evidence, and subsequent resolution plans to gain the support of notaries. Signing notaries should assess the effectiveness of proposed resolution plans and prevent clients from encountering the same data sharing problems again.
Remove outdated and abused LDNs to keep ongoing LDNs clean and focused, allowing notaries to concentrate their efforts on these open LDNs.
If notaries have different auditing standards, they need to disclose them in their notary application form. If accusations arise, the governance team and community members can assess them based on the disclosed auditing standards.
Welcome community members to discuss and supplement further...
Timeline
The next Notary governance call.
Technical dependencies
Suggested governance team maintains a public document and ensures continuous iteration to serve as a guiding specification for community notary signatures.
Related Issues
https://github.com/filecoin-project/notary-governance/issues/919 https://github.com/filecoin-project/notary-governance/issues/925