filecoin-project / notary-governance

114 stars 58 forks source link

Modification: LDN process - Increasing limit for DataCap requests #594

Closed dkkapur closed 1 year ago

dkkapur commented 2 years ago

Issue Description

In the current scope of the LDN process (see https://github.com/filecoin-project/filecoin-plus-large-datasets#current-scope), clients are able to apply for up to 5 PiB of DataCap. This was initially instituted for a couple of reasons:

With v3 - this becomes substantially easier to manage, especially for applications for open/public datasets (see https://github.com/filecoin-project/notary-governance/issues/509 for more details on the v3 changes).

In the last 6+ months, the usage of the LDN process has increased substantively. Several data owners or client representatives are now applying for second applications or filing multiple applications up front for projects that need >5 PiB of DataCap. This creates additional management overhead and also presents interruption to the onboarding flow of a client, since they not only need to get DataCap granted through a second application, but also can have their allocation tranches reset and go back to getting a lower amount of DataCap initially.

As a result - proposing that we increase the maximum amount of DataCap per application to something higher. Initially, we can move up to 25 PiB, with the explicit intent in the future to index much higher on actual onboarding rates and size of raw data onboarded + replication needs.

(Note, this was initially suggested a while back in #227.)

Impact

This change reduces the overhead for clients, notaries, and the governance team. This change also increases safety in some cases for the system, where applications are likelier to be long term associated with a client / project and can serve as a single source of truth of client needs and data. This change also gives more confidence to entities doing business development in the network to hunt projects that have larger DataCap needs.

Proposed Solution(s)

Increase LDN scope to support applications up to 5 PiB.

Tactically, this means:

  1. adjusting the scope in documentation
  2. adjusting the limits in the application template
  3. adjusting the limits in application validation
  4. not required, but recommended: improving guidance for due diligence for notaries

Screenshots of current status:

  1. Scope documentation: image
  2. Application template: image

Timeline

The proposed solution will likely take at least 1 week to implement.

Technical dependencies

The validation bot has to be updated.

End of POC checkpoint (if applicable)

Recommending that we check in after 6 weeks and 12 weeks to look at potential abuse of this. See Risks outlined below.

Risks and mitigations

Risk: By increasing the total amount of DataCap requestable and not adjusting the tranche sizes, this will enable people to now apply for a theoretical max of 1.25 PiB of DataCap in their first tranche (25 PiB DataCap requested with a claimed onboarding rate of 2.5+ PiB/week).

Though this is a concern from a safety standpoint if untrustworthy projects are able to get DataCap, we're also simultaneously investing in and improving the due diligence process as a community. Efforts include KYC learnings from E-Fil+, improved applications templates, better monitoring and risk analysis tools, and more engaged members of the community helping with due diligence.

Separately, we should also look at putting a FIP together as a community to remove DataCap from sealed sectors (IIRC this was already pitched in the past) to ensure projects that do end up sealing verified deals that should not have been verified can still be adjusted down.

Related Issues

227

raghavrmadya commented 2 years ago

I agree that there is a risk of abuse but the possible efficiency increase for various actors is worth a test run. In addition to DC removal, I would like to propose checks and balances in place such as clients benefitting from this increase in present updates at a governance call as well as a detailed SP allocation plan.

BobbyChoii commented 2 years ago

I've shared some of my ideas on slack, now reposting it here.

Quoting the LDN Datacap stage provided by @dkkapur here, thanks for the precise classification.

이미지1

I think we can all agree that our actual purpose is about raising (iii) asap within the community compliance framework. BUT

The vision of filecoin is to help more clients store datasets into the network, and for client with mega datasets, we can absolutely welcome them by raising the kyc audit criteria without having to set 25PiB as the upper limit for all clients. Certainly, some clients with very large storage needs do require 50PiB/100PiB or even more datacap, but this does not suit all clients and is the primary reason why this should not become a universal standard.

Increasing the upper limit of applications will face a lot of data abuse, and if it is to better meet a small portion of larger storage needs we can completely develop a non-general application review method.

And of course, if PL must pass this, I would love to know why it is 25PiB not 30 or 50PiB? How was this number determined?

Thanks! Bobby

raghavrmadya commented 2 years ago

Hi @BobbyChoii, thank you for adding your thoughts on this proposal. I'm RG, and I focus on preventing DataCap abuse. The proposal is to increase (ii) based on the ratio of (iii)/(ii) which is above 75% at the moment. This means that for deal-making clients / actively onboarding clients, DataCap is actually getting used up pretty quickly.

I agree that ((i) and (iii) aren't significantly correlated and to my understanding, we don't expect much correlation there. If anything, we expect (i) and (ii) to be correlated as more allocation means the need to commit/provision more DC through RKH/community nexus.

Checkpoints at (i) and (ii) will continue to remain active and are consistently being improved. As things stand today, clients who need more than 5 PiBs open multiple applications, often with different client addresses contributing to more overhead both for the governance team as well as the notaries to keep track to simply sign and do the due diligence in a repetitive manner to abide by the standards we have set in place.

Having a client be upfront about the amount of total DC they need and apply in a single application is overall more efficient and I do agree it comes with the likelihood of increased abuse. My own strategy to achieve the vision of the network as you've outlined is to create more penalties rather than barricades and red-tapism. As all allocations are tranched, the community is always welcomes and encouraged to monitor DataCap usage and deal-making behavior at Dashboards such as - https://filplus.d.interplanetary.one/clients

raghavrmadya commented 2 years ago

To answer why 25 PiBs, it's based on the volume we are seeing here - https://github.com/filecoin-project/filecoin-plus-large-datasets/issues. On average, a client needing more than 5 PiBs opens 3-5 applications which is why the proposal is for 25 PiBs. If the community feels this should be 30 or 15 or anything else, we can assess that based on incoming plus historical application data

raghavrmadya commented 2 years ago

I'm happy to share some efforts and projects that are in the works and will eventually be presented at a governance call with regard to preventing DC abuse if it helps the community gain confidence

raghavrmadya commented 2 years ago

Adding to this discussion, here are the kind of clients and applications which are vetted and would benefit from this proposal - https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/483

We will bring it up again at the governance call to get more community feedback. @BobbyChoii , it would be great if you are able to make it to one of the gov calls this coming week.

raghavrmadya commented 2 years ago

Adding examples of clients that would benefit from this proposal -

  1. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/796
  2. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/722
jessie8o8 commented 2 years ago

This modification would decrease the overhead of applying and tracking datacap.

Reasons why I am in favor of it:

Flags brought up in the notary call yesterday:

It would result in an increase in LDNs that is pushing the limit of 25 PiBs

  • This may be true but that does not mean all are going to be approved. If we have a strong KYC process (being proven out through E FIL+), there should not be a substantial increase in larger LDN approval. If LDNs larger than 5 PiBs are not hindered by the obstacle of having to split the LDNs, why should the burden of additional overhead persist.

It would result in more work for notaries to approve bigger tranches

I would argue that this would result in less work for notaries because

For our team, this issue persisted with our NEXRAD dataset. We did not know the status of the project as a whole, thus giving us uncertainty to begin the project.

BobbyChoii commented 2 years ago

@raghavrmadya appreciate all the efforts. I had a busy week and didn't make it to the governance call. Is there any notes from the last meeting or could you let me know where to get the updates? Thanks!

dkkapur commented 2 years ago

@BobbyChoii we spent some time talking about this at the governance call, you should check out the recording here.

Main takeaway was that we should look into proposing changing the tranche limits to add some safer upper bounds for initial allocations. Opening up the floor here for proposals!

First allocation: lesser of 5% of total DataCap requested or 50% of weekly allocation rate Second allocation: lesser of 10% of total DataCap requested or 100% of weekly allocation rate Third allocation: lesser of 20% of total DataCap request or 200% of weekly allocation rate Fourth allocation: lesser of 40% of total DataCap requested or 400% of weekly allocation rate Fifth allocation onwards: lesser of 80% of total DataCap request or 800% of weekly allocation rate

BobbyChoii commented 2 years ago

@dkkapur thanks for sharing. Along with the governace call and some recent updates from the community, there are a couple of questions I'd like to confirm.

Quoting from slack #fil-plus channel here.

Trust and transparency update - Affective today, any application requesting the maximum limits (5 PiBS) and 100 TiBs weekly allocation without proper justification in the application and that have less than 10 data samples will be flagged and will most likely be asked to open a new application with proper justification. 

  1. Could you be more specific on the most likely part?

  2. If it is what you mentioned in slack before, I think all applications that do not meet the criteria should be closed and all applicants should open new issues. The fairness is nowhere to find in the case comparison i mentioned above. If consistency is not maintained, this will cause endless confusion in the approval order at a later stage.

  3. The new LDN application rules need to be published through modification here just like this one proposed by Deep. And be synced to slack for discussion. This is a community-driven place and no decision should be made unilaterally by one person.

Thanks! Bobby

BobbyChoii commented 2 years ago

Just went through the other two applications quickly. I would like to share some of the parts that I think require due diligence and please correct me if I'm wrong. #796 I don't see any proof of the affiliation between the applicant and UC Berkeley. And why it is not verified by the university domain email address and publicized with screenshots of emails like all other LDN applications? Also no disclosure was made on the SP allocation plan, or the nodes. Moreover, with the discussion in the last GC about #602, whether any notaries need to be recused because of direct benefit is also something we need to be concerned about. With none of these information available it doesn't even meet the current 5PB LDN rule, let alone the 25PB application that requires higher transparency. #722 Not even one data sample was shared in this application. Also I could not find or download any useful data from its website.

Quoting from their website ,

As a result the demand for public online content from blogs, microblogs, news and consumer reviews grew and we had the tools to deliver the data our customers needed. We are now the world’s largest index of human to human conversational content.

Sounds like fancy words for taking data from individuals without any permissions. Blog owners, post users, journalists they are the real owners of the data. It's their intellectual property. We all have the right to read them, to save in folders, to share with friends and that's it. Taking them for financial benefit is definately not in the many rights shared by these platforms.

image

Let's take Quora as an example. Here's their policy about copyright. https://help.quora.com/hc/en-us/articles/360052494012-How-does-Quora-intend-to-enforce-the-Not-for-Reproduction-feature- If the applicant wanna store them in filecoin for business interest, then at least they should provide licenses from all these platforms. If they can't, i dont think we should support such an application.

Have we really thought through the need for 25PB? Seems like it will only increase friction and more challenges to community fairness...

raghavrmadya commented 2 years ago

@dkkapur thanks for sharing. Along with the governace call and some recent updates from the community, there are a couple of questions I'd like to confirm.

  • Are all existing applications with higher storage needs eligible for the 25Pib amount?
  • I think the discussion about this modification in governance call was not done in detail due to the time limit. @raghavrmadya as for the three cases that you think would benefit from this. Are you just using them as examples or do you think they are more applicable to this proposal than the other applications? If the latter, I haven't seen them being fully discussed on the governance channel either. Will there be any further discussions? Or are they already done somewehre else that i might haven't noticed yet. Please let me know exactly where if so.
  • A large number of LDN applications were manually closed because they did not match the 5PB/100TB adjustment. If this is a mandatory rule I don't know why there are still a lot of applications that don't meet that criteria are still open which includes the three cases you gave. As an example [DataCap Allocation] - Public Welfare Project about popular science of Pregnancy and Birth filecoin-plus-large-datasets#869 was deemed unreasonable due to the 1PB weekly allocation, but [DataCap Application] Speedium - NexRad V2 [ Replica distribution ] Part 1 filecoin-plus-large-datasets#483 has excatly the same request. Also, the question of sunnyiscoming who I believe is a PL staff, her question was also not given any formal answer. I don't know what the reason is for this application being mentioned here yet. But if you think it is more qualified than other applications, I really would like to get specific explanations.
  • What are the specific rules for closing LDN issues these days? Tha's the most important thing that I would like to get confirmed.

Quoting from slack #fil-plus channel here.

Trust and transparency update - Affective today, any application requesting the maximum limits (5 PiBS) and 100 TiBs weekly allocation without proper justification in the application and that have less than 10 data samples will be flagged and will most likely be asked to open a new application with proper justification.

  1. Could you be more specific on the most likely part?
  2. If it is what you mentioned in slack before, I think all applications that do not meet the criteria should be closed and all applicants should open new issues. The fairness is nowhere to find in the case comparison i mentioned above. If consistency is not maintained, this will cause endless confusion in the approval order at a later stage.
  3. The new LDN application rules need to be published through modification here just like this one proposed by Deep. And be synced to slack for discussion. This is a community-driven place and no decision should be made unilaterally by one person.

Thanks! Bobby

Hi @BobbyChoii , thanks for the comments.

  1. If the proposal passes, all existing clients with higher storage needs are welcome to submit applications for the 25Pib amount. They will not be able to just update existing applications if that is what your question was.

  2. I'm listing them as examples. Applications are only discussed during gov calls if they are controversial/many flags have been raised/client wants to come present further insights. Any discussion will be found on the issue itself and you can track it on GitHub.

raghavrmadya commented 2 years ago

To your second comment @BobbyChoii, the first issue you have cited has gone through KYB through the client growth team. We have not conducted the KYC yet and there has been no trigger. For the second issue, we have had the SP driving BD for this project reach out. They also might have private data and will be held to the same process as any other client. Of course, as you are aware, a trigger from the governance team is not DataCap approval and notaries have the final say. If you have questions about these specific issues, please comment on the issue itself.

raghavrmadya commented 2 years ago

Finally, respond to this -

"Trust and transparency update - Affective today, any application requesting the maximum limits (5 PiBS) and 100 TiBs weekly allocation without proper justification in the application and that have less than 10 data samples will be flagged and will most likely be asked to open a new application with proper justification.

Could you be more specific on the most likely part?

If it is what you mentioned in slack before, I think all applications that do not meet the criteria should be closed and all applicants should open new issues. The fairness is nowhere to find in the case comparison i mentioned above. If consistency is not maintained, this will cause endless confusion in the approval order at a later stage.

The new LDN application rules need to be published through modification here just like this one proposed by Deep. And be synced to slack for discussion. This is a community-driven place and no decision should be made unilaterally by one person."

As the trust and transparency Lead, it is my mandate to prevent DataCap abuse. More recently, I've seen rampant abuse, and the message posted is merely a flag, not a rule. If any application does not provide justification for the amounts being requested, I will be inclined to close the application and request them to open a new application instead of going back and forth requesting for information. There is enough precedent on GitHub and we have made the rules very clear.

A client can also reach out to the client growth team, get notary support, and/or share a working relationship with SPs to gain trust. This is a governance team process choice to be more efficient. If you disagree, please open a discussion on the governance repo.

BobbyChoii commented 2 years ago

@raghavrmadya thanks for sharing.

  1. If the proposal passes, all existing clients with higher storage needs are welcome to submit applications for the 25Pib amount. They will not be able to just update existing applications if that is what your question was.
  2. I'm listing them as examples. Applications are only discussed during gov calls if they are controversial/many flags have been raised/client wants to come present further insights. Any discussion will be found on the issue itself and you can track it on GitHub.

ACK. If you're just using them as examples, then I have no more doubts about that.

To your second comment @BobbyChoii, the first issue you have cited has gone through KYB through the client growth team. We have not conducted the KYC yet and there has been no trigger. For the second issue, we have had the SP driving BD for this project reach out. They also might have private data and will be held to the same process as any other client. Of course, as you are aware, a trigger from the governance team is not DataCap approval and notaries have the final say.

Does the client growth team conduct KYB for all participating businesses in the community? What are the requirements for business? As sp if there are companies that are suitable and want to participate in filecoin, how can I help them to make contact?

If you have questions about these specific issues, please comment on the issue itself.

Thanks for the heads up, I will comment below the issue. May I know what percentage of applications were contacted through this non-public type tho? I haven't seen any public notices on github or slack about this. In addition, I think there should be some level of official disclosure about these communications that are not done publicly. Either the governance team or any member of these teams you mentioned before should make clarification under the issue itself to reduce the confusion.

As the trust and transparency Lead, it is my mandate to prevent DataCap abuse. More recently, I've seen rampant abuse, and the message posted is merely a flag, not a rule. If any application does not provide justification for the amounts being requested, I will be inclined to close the application and request them to open a new application instead of going back and forth requesting for information. There is enough precedent on GitHub and we have made the rules very clear. A client can also reach out to the client growth team, get notary support, and/or share a working relationship with SPs to gain trust. This is a governance team process choice to be more efficient. If you disagree, please open a discussion on the governance repo.

Appreciate your efforts in preventing DC abuse. I believe this will make Filecoin a better place to store useful data. But based on the way the issues were closed recently, #483 as I mentioned before, I still don't understand with weekly 1PB allocation why is it still open? This proposal is still under discussion and has not been approved yet. Whether it is through the client growth team or BD or any other way, why can't applications that don't make changes to the max limit and weekly allocation be closed? They could have reopened new issues just like others.

Standards are meant to be universal for all. If there are special cases, they can be implemented according to the consensus rules. Such as asking for support of notaries on gvernance. But at the very least the benchmark should be the same. As in this application, #840 I think it is against the rules of democracy if only your approval is needed.

If consistency is not maintained, this will cause endless confusion in the approval order at a later stage.

If you still don't think it is necessary to follow a uniform rule. That is, no need to apply the same 5pb/100TB requirement. I would also like to know if you have any prevention methods for this?

Thanks! Bobby

BobbyChoii commented 2 years ago

Hi @raghavrmadya @dkkapur, is there any update?

salstorage commented 2 years ago

As a SP we currently have multiple large +25PiBs deals in the pipeline. With the current 5PiB max LDN and Fil+ requirements of 6 copies distributed, that would mean 30 LDN applications which is not scalable, and becomes an administrative nightmare. This along with 50/100/200/400.... TiB tranche approvals makes the situation extremely complex to manage for an SP and supported Notaries alike. We support Increasing limit for DC requests as a proactive measure for these large Enterprise level deals incoming.