filecoin-project / notary-governance

115 stars 58 forks source link

v5 Notary Allocator Application: Deep Kapur #1099

Open dkkapur opened 10 months ago

dkkapur commented 10 months ago

v5 Notary Allocator Application

To apply to be an allocator, organizations will submit one application for each proposed pathway to DataCap. If you will be designing multiple specific pathways, you will need to submit multiple applications.

Please complete the following steps:

1. Fill out the information below and create a new GitHub Issue

  1. Notary Allocator Pathway Name (This can be your name, or the name of your pathway/program. For example "E-Fil+"): Deep Kapur
  2. Organization Name: Official name pending, currently referred to as Flamenco team, spinning out from Protocol Labs
  3. On-chain address for Allocator (Provide a NEW unique address. During ratification, you will need to initialize this address on-chain): f1nmtksi22xjz44mdpcztss4mx3ofmdjzfea2sdny
  4. Country of Operation (Where your organization is legally based): USA
  5. Region of Operation (What region will you serve?): All Regions
  6. Type of Allocator, diligence process: (Automated/programmatic, Market-based, or Manual (human-in-the-loop at some phase): Manual initially, intending to experiment with programmatic allocations in the future.
  7. DataCap requested for allocator for 12 months of activity (This should be an estimate of overall expected activity. Estimate the total amount of DataCap you will be distributing to clients in 12 months, in TiB or PiB): 50 PiB

2. Access allocator application (download to save answers)

Click link below to access a Google doc version of the allocator application that can be used to save your answers if you are not prepared to fully submit the application in Step 3. https://docs.google.com/document/d/1-Ze8bo7ZlIJe8qX0YSFNPTka4CMprqoNB1D6V7WJJjo/copy

3. Submit allocation application

Clink link below to access full allocator questionnaire and officially submit your answers: https://airtable.com/appvyE0VHcgpAkt4Z/shrQxaAIsD693e1ns

Note: Sections of your responses WILL BE posted back into the GitHub issue tracking your application. The final section (Additional Disclosures) will NOT be posted to GitHub, and will be maintained by the Filecoin Foundation. Application information for notaries not accepted and ratified in this round will be deleted.

ghost commented 10 months ago

Basic Information

1. Notary Allocator Pathway Name: Deep Kapur

2. Organization: Official name pending, currently referred to as Flamenco team, spinning out from Protocol Labs.

3. On Chain Address for Allocator: f1nmtksi22xjz44mdpcztss4mx3ofmdjzfea2sdny

4. Country of Operation: USA

5. Region(s) of operation: Africa , Greater China, Oceania, North America, South America, Japan, Europe, Asia minus GCR

6. Type of Allocator: Manual

7. DataCap requested for allocator for 12 months of activity: 50 PiB

8. Is your allocator providing a unique, new, or diverse pathway to DataCap? How does this allocator differentiate itself from other applicants, new or existing?: In the immediate term - this allocator pathway will likely not offer anything novel in terms of the actual design of the allocator. The main value proposition is that this will be owned and operated by members of the previous Data Programs team at Protocol Labs, which helped support the Fil+ program but also built the best-in-class and highest utilization open source software used to onboard data to the Filecoin network. The goal of the allocator is to unblock large scale data onboarding that uses tools like Spade to get data onto the Filecoin network. We work with known entities/businesses on data onboarding, and will be working with them to build dedicated pipelines to ensure data is prepared in a sensible way and is onboarded with SP partners that uphold high standards and enterprise grade SLAs.

9. As a member in the Filecoin Community, I acknowledge that I must adhere to the Community Code of Conduct, as well other End User License Agreements for accessing various tools and services, such as GitHub and Slack.: Acknowledge

Client Diligence

10. Who are your target clients?: Small-scale developers or data owners, Enterprise Data Clients

11. Describe in as much detail as possible how you will perform due diligence on clients. If you are proposing an automated pathway, what diligence mechanism will you use to determine client eligibility?: For users in the short term, we expect to be able to conduct sufficient due diligence on individuals/entities that includes: Entity / business information Entity / business form of payment (on-chain address with historicity or off-chain, i.e., a credit card). In some cases, we will be facilitating actual data onboarding, so the expectation is that this maps to what a user will be using to make payments to SPs on the network Content Policy - a publicly published or publishable policy that explains the data being onboarded, how it was funded/sourced, and what the goals are for storing it on the network

We expect the following, and will be able to verify with tools that we run ourselves or will be using as a service: Minimum of 5 replicas distributed geographically and across SP owner/operator entities Data made available for SPs to store in a reasonable time frame (case by case, but ideally within days or weeks) Data stored by SPs in a reasonable time frame - no significant lag (i.e., targeting <72h from data availability to first replica on chain) SPs uphold enterprise grade SLAs for durability, availability, and retrievability

If you are proposing an automated pathway, what diligence mechanism will you use to determine client eligibility? * Though initially the pathway will be manual, we do have a handful of ideas for automation that we would like to test and transition to. This includes mechanisms for: dID and tracking of legitimate users across networks ways to programmatically prove that storage was paid, either in FIL, or a different crypto/fiat currency ensuring the storage is actually onboarded in a compliant manner - i.e., using existing tools like Spade to enforce compliant data onboarding

12. Please specify how many questions you’ll ask, and provide a brief overview of the questions.: If you have a form, template, or other existing resource, provide the link. For users of our platform - we collect information and share it publicly for audit and verification. Here’s an example of one: https://bafkreihuqkipjv2sgc3ypr5lcervqitht2m5f6iyr4g432mpqwzmfm7jtq.ipfs.dweb.link/. This information is collected from users directly, and published for public audit and verification. Every SP that takes a part in storing a user’s data explicitly agrees with this policy, thereby ensuring data is onboarded with as much transparency as possible. For users that wish to come to our allocator but not use our tools, we will eventually build out tooling that supports them as well. The goal is to collect information that includes: Who they are What they are onboarding Where the data came from and how it was collected What the expectations are for its storage on Filecoin

13. Will you use a 3rd-party Know your client (KYC) service?: Likely yes, services like Togggle or Synapse are familiar and will be useful. Outside of this, other forms of business or entity verification could be employed.

14. Can any client apply to your pathway, or will you be closed to only your own internal clients? (eg: bizdev or self-referral): Our priority is to deliver a useful service to our users, but we also plan to support the rest of the network.

15. How do you plan to track the rate at which DataCap is being distributed to your clients?: Our team maintains one of the only reliable sources of deal tracking for active deals on the Filecoin network today. We plan to leverage our highly reliable tooling and expand it further to track DataCap distribution/utilization and share dashboards publicly. I was also directly involved in the architecture and implementation of datacapstats.io, and hope to continue upholding a similar standard in the future.

Data Diligence

16. As an operating entity in the Filecoin Community, you are required to follow all local & regional regulations relating to any data, digital and otherwise. This may include PII and data deletion requirements, as well as the storing, transmit: Acknowledge

17. What type(s) of data would be applicable for your pathway?: Public Open Dataset (Research/Non-Profit), Private Commercial/Enterprise, Private Non-Profit/Social Impact

18. How will you verify a client’s data ownership? Will you use 3rd-party KYB (know your business) service to verify enterprise clients?: For enterprise data stored leveraging current/relevant technologies, there are a myriad of ways to do this. However - the reality is that it is not difficult for a malicious user to find ways to create untrue claims about data ownership. We believe that most scalable data onboarding happens through out-of-band data transfer, where SPs fetch data async from a client, and then activate the deal on-chain. This process gives us information about the client - in what shape is that data, how is it stored, where is it available, etc. Most of the clients we work with need help with even getting their data prepared, giving us significant insight into what the data is and in what shape it should be stored in order for it to be useful in the future. This, alongside our compliant data onboarding, will provide us with confidence that we are working with legitimate data owners.

19. How will you ensure the data meets local & regional legal requirements?: By forcing users to publish publicly audited and verifiable claims about their data, in the form of a policy (see question 12 for more details). This then enables us to validate that the data meets requirements but also forces users to be compliant in the court of public opinion.

20. What types of data preparation will you support or require?: Yes - we have worked with several clients and designed best-in-class data onboarding solutions today. We expect to continue doing this in the future. Here’s an example of a library our teammates have released in the past: https://github.com/anjor/go-fil-dataprep.

21. What tools or methodology will you use to sample and verify the data aligns with your pathway?: If someone tries hard enough - there will always be a way for this to be gameable. For our users, we expect to be involved in the data preparation process, guaranteeing that it is compliant. For others, the bar will be higher and the published policy will help us, SPs, and others hold users accountable.

Data Distribution

22. How many replicas will you require to meet programmatic requirements for distribution?: 5+

23. What geographic or regional distribution will you require?: At most 2 in any city. Data distributed across at least 3 countries and 2-3 continents

24. How many Storage Provider owner/operators will you require to meet programmatic requirements for distribution?: 3+

25. Do you require equal percentage distribution for your clients to their chosen SPs? Will you require preliminary SP distribution plans from the client before allocating any DataCap?: No - but we do require replica distribution on a per CID level. Here’s an example of a current user of our platform: https://dataprograms.grafana.net/public-dashboards/5c0e3034da464cef94552bdd7a0eac5a?orgId=1&from=now-1y&to=now. Note that data has been distributed across several SP IDs, but its not exactly symmetrical. This is because our tools are sophisticated enough to ensure compliant replication on a per piece CID basis. Clients don’t need an SP distribution plan, they just need to agree on a data replication one - we help ensure that the data is actually distributed across SPs. We actively work with clients to pay / incentivize their SPs, resulting in more symmetric replica distribution across SP entities over time.

26. What tooling will you use to verify client deal-making distribution?: Datacapstats.io + our own tooling (e.g., the Grafana dashboard linked in Q25).

27. How will clients meet SP distribution requirements?: Our platform - Spade.

28. As an allocator, do you support clients that engage in deal-making with SPs utilizing a VPN?: SPs need to report their locations to us to ensure we can actually get data to where its supposed to be safely and correctly. So inherently, the answer to this question really depends on the data itself and the policies imposed by clients. In almost all cases, we end up tracking real SP locations with tools like Kentik/Cisco Thousand Eyes, and ensuring data is correctly distributed.

DataCap Allocation Strategy

29. Will you use standardized DataCap allocations to clients?: Yes, standardized

30. Allocation Tranche Schedule to clients:: First: 100 TiB • Second: 200 TiB • Third: 400 TiB • Fourth: 1000 TiB • Max per client overall: 20 PiB (major edge cases only)

31. Will you use programmatic or software based allocations?: Yes, standardized and software based

32. What tooling will you use to construct messages and send allocations to clients?: We’d love to use existing tooling or a fork of it.

33. Describe the process for granting additional DataCap to previously verified clients.: We plan to use comparable tooling to the SA bot. We need something more stable that works with our platform as well.

34. Describe in as much detail as possible the tools used for: • client discoverability & applications • due diligence & investigation • bookkeeping • on-chain message construction • client deal-making behavior • tracking overall allocator health • disput: Address all the tools & software platforms in your process.

  1. Client discoverability and applications: filplus.storage + GitHub issues
  2. Due diligence and investigation: customer signup flow from our platform + tracking in public issues
  3. Bookkeeping: public GitHub issues with historicity + internal tools for tracking client actions + publicly published user-defined data onboarding and content policy 
  4. Client deal-making behavior: deals oracle with public dashboards + datacapstats.io 
  5. Tracking allocator health: datacapstats.io + custom internal facing tooling on DataCap balances and predictive modeling on future DataCap utilization 
  6. Dispute discussion & resolution: separate GitHub repo with issues for allocator disputes + Fil+ WG defined process for meta-level disputes + customer email inbox with internally defined DRI as escalation point for dispute resolution
  7. Community updates & comms: website pages and documentation, blog posts, all cross linked in Filecoin Slack + #fil-plus channel + discussions in notary governance repo and coming to governance calls when relevant

Tools and Bookkeeping

35. Will you use open-source tooling from the Fil+ team?: We plan to use as much tooling from the Fil+ as makes sense and is possible. This includes - intake forms from filplus.storage, GitHub repos and tooling available alongside them. We do need our own tooling for users of our platform, but that will be built to fit into the system with minimal friction for users.

36. Where will you keep your records for bookkeeping? How will you maintain transparency in your allocation decisions?: GitHub when possible. All client info relevant to content policy and data distribution will be published to IPFS as well, and be made available to SPs and anyone else in the community. In the case of private info, i.e., KYC records or payment info for off-chain payments, we will support an escalation path to get this information directly from our team.

Risk Mitigation, Auditing, Compliance

37. Describe your proposed compliance check mechanisms for your own clients.: Deal distribution will be compliant by definition, thanks to our tools. Outside this, we plan to test retrievability through our own instance of the retrieval bot, and we will be publishing open dashboards for all our clients and SPs so anyone can audit onboarding behavior and we can start to build tools around outlier detection. We don’t anticipate a large volume of clients to our tools, so we plan to work closely with each one and ensure we have all the information we need to have confidence in any next steps. We do already run a dedicated instance of the retrieval bot, and plan to continue doing this.

38. Describe your process for handling disputes. Highlight response times, transparency, and accountability mechanisms.: Disputes from the ecosystem/external to the allocator will be the highest priority, handled through making a dedicated POC from our team available within 24h to handle the escalation and provide all necessary information to work towards resolution. Disputes within the allocator - i.e., against a client, will result in data onboarding being paused with immediate effect until the situation can be resolved, and by leveraging tools like unverified deals or more retrievability tests to get clients back into a compliant status. The hardest cases will definitely be for data that we have confidence that turns out to be wrong, i.e, specific info about a client or an SP. Handling this will be manual, so we can ensure each one gets the attention it needs to be resolved successfully.

39. Detail how you will announce updates to tooling, pathway guidelines, parameters, and process alterations.: Appropriate GitHub repos, website/doc pages, and Filecoin Slack fil-plus channel.

40. How long will you allow the community to provide feedback before implementing changes?: 2-4 weeks. We plan to leverage the existing frameworks in the Fil+ ecosystem (gov calls, discussions, issues, etc.) + Issues on our own repos to enable useful feedback collection.

41. Regarding security, how will you structure and secure the on-chain notary address? If you will utilize a multisig, how will it be structured? Who will have administrative & signatory rights?: Initially, the address is generated from a hardware wallet (Ledger Nano X). We plan to have 1 DRI (me), 1 fallback, and 1 secondary fallback. In the future, we would like to switch over to a multisig once our team spinoff plan is finalized. At this point, we expect 7 or so members on the multisig, with a threshold of 2 signers.

42. Will you deploy smart contracts for program or policy procedures? If so, how will you track and fund them?: We are working towards progressive decentralization of data onboarding to the Filecoin network. As part of this, we expect to move portions of our stack into contracts. It is inevitable that the DataCap allocation management will eventually also move towards automation and smart contracts. We don’t have any short term plans or progress to share at this stage, but will keep everyone informed!

Monetization

43. Outline your monetization models for the services you provide as a notary allocator pathway.: For the first phase - we have no specific staking/collateral based models. We plan to blocklist entities that game our systems and track this publicly. We plan to charge clients for storage, passing most of the $ to SPs. Some of the $ will be used to fund our services, including the allocator pathway.

44. Describe your organization's structure, such as the legal entity and other business & market ventures.: We are in the process of spinning out of Protocol Labs. We plan to set up a single LLC entity in the US.

45. Where will accounting for fees be maintained?: N/A with current plans.

Past Experience, Affiliations, Reputation

46. If you've received DataCap allocation privileges before, please link to prior notary applications.: I have served on the Fil+ team for over 3 years, and so even though I have not directly allocated DataCap, I’ve played a key role in enabling every single other notary in some capacity.

47. How are you connected to the Filecoin ecosystem? Describe your (or your organization's) Filecoin relationships, investments, or ownership.: Our team consists of several individuals that have played significant roles in the development of data onboarding software in the Filecoin network. Currently, this includes working on boost, Spade, and related software like the retrieval-bot, deals oracle, and data prep tools. Our team includes Filecoin ecosystem experts, developers and at least 1 active SP operation.

48. How are you estimating your client demand and pathway usage? Do you have existing clients and an onboarding funnel?: We currently have 2-3 clients with a long term line of sight to a total of 20 PiBs. We expect to actively engage in further outreach and have set an internal target of onboarding net new 6-7 PiB of unique data in 2024 (>50 PiB with 10x replication).

galen-mcandrew commented 8 months ago

Datacap Request for Allocator

Address

f2e6lube4xsam3sywpmlsjmmuen7ezawfgzmts5uq

Datacap Allocated

5PiB

filplus-bot commented 8 months ago

The request has been signed by a new Root Key Holder

Message sent to Filecoin Network

bafy2bzacecxv63p4vxzhpoqlgwphz64jf44qdjmuoyvw3aawq6e5ik7x4sorw

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecxv63p4vxzhpoqlgwphz64jf44qdjmuoyvw3aawq6e5ik7x4sorw

Kevin-FF-USA commented 5 months ago

Hi @dkkapur

Wanted to send a friendly check in.
As of this comment date, haven't seen any client applications or DataCap disbursements for the organization. Inactive organizations are in the process of being reviewed to see if still wanting to remain in program. If you would like this pathway to remain active as an Allocator, kindly asking that you reply to the proposal with your timeline for onboarding pathway or plan for action to clients.

1099 | Flamenco team; spinning out from Protocol Labs Deep Kapur | Manual | Bookkeeping | North America | https://github.com/filecoin-project/notary-governance/issues/1099 | f03019950

Kevin-FF-USA commented 5 months ago

Hi @dkkapur!

Very friendly hello sweet Deep. : ) Tonight is the final night to take action on a timeline for distribution of DataCap, or just a check in that you would like to remain active in the program. If you would like to remain active as a an Allocator - please provide a timeline for activity to this Allocator in https://github.com/filecoin-project/Allocator-Governance/issues/6

If we dont hear back, this Allocator pathway will be sunset and you will need to reapply in order to receive DataCap again.

If you wish to remain active, please respond to Issue 6 with a timeline /or/ roadmap.

Warmly, -Kevin

https://docs.google.com/presentation/d/1yx-C1urFX7I_A1kmhJTXBy8R42770ZnST0WoQaZzTd0/edit?usp=drive_link and https://docs.google.com/presentation/d/1pmrRvAyxP56ZjMpcItVbiuJcAvYNQ_YDyD3n6FtX5Co/edit?usp=drive_link