NateWebb03 / FilTestRepo

A test repository for allocator application automation
1 stars 0 forks source link

Test app 1096 #1099

Open NateWebb03 opened 5 months ago

NateWebb03 commented 5 months ago

Notary Allocator Pathway Name:

Deep Kapur

Organization:

Official name pending, currently referred to as Flamenco team, spinning out from Protocol Labs.

Allocator's On-chain addresss:

f1nmtksi22xjz44mdpcztss4mx3ofmdjzfea2sdny

Country of Operation:

USA

Region(s) of operation:

Africa ,Greater China,Oceania,North America,South America,Japan,Europe,Asia minus GCR

Type of allocator: What is your overall diligence process? Automated (programmatic), Market-based, or Manual (human-in-the-loop at some phase). Initial allocations to these pathways will be capped.

Manual

Amount of DataCap Requested for allocator for 12 months:

50 PiB

Is your allocator providing a unique, new, or diverse pathway to DataCap? How does this allocator differentiate itself from other applicants, new or existing?

In the immediate term - this allocator pathway will likely not offer anything novel in terms of the actual design of the allocator. The main value proposition is that this will be owned and operated by members of the previous Data Programs team at Protocol Labs, which helped support the Fil+ program but also built the best-in-class and highest utilization open source software used to onboard data to the Filecoin network. The goal of the allocator is to unblock large scale data onboarding that uses tools like Spade to get data onto the Filecoin network. We work with known entities/businesses on data onboarding, and will be working with them to build dedicated pipelines to ensure data is prepared in a sensible way and is onboarded with SP partners that uphold high standards and enterprise grade SLAs.

As a member in the Filecoin Community, I acknowledge that I must adhere to the Community Code of Conduct, as well other End User License Agreements for accessing various tools and services, such as GitHub and Slack. Additionally, I will adhere to all local & regional laws & regulations that may relate to my role as a business partner, organization, notary, or other operating entity. * You can read the Filecoin Code of Conduct here: https://github.com/filecoin-project/community/blob/master/CODE_OF_CONDUCT.md

Acknowledgment: Acknowledge

Cient Diligence Section:

This section pertains to client diligence processes.

Who are your target clients?

Small-scale developers or data owners,Enterprise Data Clients

Describe in as much detail as possible how you will perform due diligence on clients.

For users in the short term, we expect to be able to conduct sufficient due diligence on individuals/entities that includes: Entity / business information Entity / business form of payment (on-chain address with historicity or off-chain, i.e., a credit card). In some cases, we will be facilitating actual data onboarding, so the expectation is that this maps to what a user will be using to make payments to SPs on the network Content Policy - a publicly published or publishable policy that explains the data being onboarded, how it was funded/sourced, and what the goals are for storing it on the network

We expect the following, and will be able to verify with tools that we run ourselves or will be using as a service: Minimum of 5 replicas distributed geographically and across SP owner/operator entities Data made available for SPs to store in a reasonable time frame (case by case, but ideally within days or weeks) Data stored by SPs in a reasonable time frame - no significant lag (i.e., targeting <72h from data availability to first replica on chain) SPs uphold enterprise grade SLAs for durability, availability, and retrievability

If you are proposing an automated pathway, what diligence mechanism will you use to determine client eligibility? * Though initially the pathway will be manual, we do have a handful of ideas for automation that we would like to test and transition to. This includes mechanisms for: dID and tracking of legitimate users across networks ways to programmatically prove that storage was paid, either in FIL, or a different crypto/fiat currency ensuring the storage is actually onboarded in a compliant manner - i.e., using existing tools like Spade to enforce compliant data onboarding

Please specify how many questions you'll ask, and provide a brief overview of the questions.

If you have a form, template, or other existing resource, provide the link. For users of our platform - we collect information and share it publicly for audit and verification. Here’s an example of one: https://bafkreihuqkipjv2sgc3ypr5lcervqitht2m5f6iyr4g432mpqwzmfm7jtq.ipfs.dweb.link/. This information is collected from users directly, and published for public audit and verification. Every SP that takes a part in storing a user’s data explicitly agrees with this policy, thereby ensuring data is onboarded with as much transparency as possible. For users that wish to come to our allocator but not use our tools, we will eventually build out tooling that supports them as well. The goal is to collect information that includes: Who they are What they are onboarding Where the data came from and how it was collected What the expectations are for its storage on Filecoin

Will you use a 3rd-party "Know your client" (KYC) service?

Likely yes, services like Togggle or Synapse are familiar and will be useful. Outside of this, other forms of business or entity verification could be employed.

Can any client apply to your pathway, or will you be closed to only your own internal clients? (eg: bizdev or self-referral)

Our priority is to deliver a useful service to our users, but we also plan to support the rest of the network.

How do you plan to track the rate at which DataCap is being distributed to your clients?

Our team maintains one of the only reliable sources of deal tracking for active deals on the Filecoin network today. We plan to leverage our highly reliable tooling and expand it further to track DataCap distribution/utilization and share dashboards publicly. I was also directly involved in the architecture and implementation of datacapstats.io, and hope to continue upholding a similar standard in the future.

Data Diligence

This section will cover the types of data that you expect to notarize.

As a reminder: The Filecoin Plus program defines quality data is all content that meets local regulatory requirements AND • the data owner wants to see on the network, including private/encrypted data • or is open and retrievable • or demonstrates proof of concept or utility of the network, such as efforts to improve onboarding

As an operating entity in the Filecoin Community, you are required to follow all local & regional regulations relating to any data, digital and otherwise. This may include PII and data deletion requirements, as well as the storing, transmitting, or accessing of data.

Acknowledgement: Acknowledge

What type(s) of data would be applicable for your pathway?

Public Open Dataset (Research/Non-Profit),Private Commercial/Enterprise,Private Non-Profit/Social Impact

How will you verify a client's data ownership? Will you use 3rd-party KYB (know your business) service to verify enterprise clients?

For enterprise data stored leveraging current/relevant technologies, there are a myriad of ways to do this. However - the reality is that it is not difficult for a malicious user to find ways to create untrue claims about data ownership. We believe that most scalable data onboarding happens through out-of-band data transfer, where SPs fetch data async from a client, and then activate the deal on-chain. This process gives us information about the client - in what shape is that data, how is it stored, where is it available, etc. Most of the clients we work with need help with even getting their data prepared, giving us significant insight into what the data is and in what shape it should be stored in order for it to be useful in the future. This, alongside our compliant data onboarding, will provide us with confidence that we are working with legitimate data owners.

How will you ensure the data meets local & regional legal requirements?

By forcing users to publish publicly audited and verifiable claims about their data, in the form of a policy (see question 12 for more details). This then enables us to validate that the data meets requirements but also forces users to be compliant in the court of public opinion.

What types of data preparation will you support or require?

Yes - we have worked with several clients and designed best-in-class data onboarding solutions today. We expect to continue doing this in the future. Here’s an example of a library our teammates have released in the past: https://github.com/anjor/go-fil-dataprep.

What tools or methodology will you use to sample and verify the data aligns with your pathway?

If someone tries hard enough - there will always be a way for this to be gameable. For our users, we expect to be involved in the data preparation process, guaranteeing that it is compliant. For others, the bar will be higher and the published policy will help us, SPs, and others hold users accountable.

Data Distribution

This section covers deal-making and data distribution.

As a reminder, the Filecoin Plus program currently defines distributed onboarding as multiple physical locations AND multiple storage provider entities to serve client requirements.

Recommended Minimum: 3 locations, 4 to 5 storage providers, 5 copies

How many replicas will you require to meet programmatic requirements for distribution?

5+

What geographic or regional distribution will you require?

At most 2 in any city. Data distributed across at least 3 countries and 2-3 continents

How many Storage Provider owner/operators will you require to meet programmatic requirements for distribution?

3+

Do you require equal percentage distribution for your clients to their chosen SPs? Will you require preliminary SP distribution plans from the client before allocating any DataCap?

No - but we do require replica distribution on a per CID level. Here’s an example of a current user of our platform: https://dataprograms.grafana.net/public-dashboards/5c0e3034da464cef94552bdd7a0eac5a?orgId=1&from=now-1y&to=now. Note that data has been distributed across several SP IDs, but its not exactly symmetrical. This is because our tools are sophisticated enough to ensure compliant replication on a per piece CID basis. Clients don’t need an SP distribution plan, they just need to agree on a data replication one - we help ensure that the data is actually distributed across SPs. We actively work with clients to pay / incentivize their SPs, resulting in more symmetric replica distribution across SP entities over time.

What tooling will you use to verify client deal-making distribution?

Datacapstats.io + our own tooling (e.g., the Grafana dashboard linked in Q25).

How will clients meet SP distribution requirements?

Our platform - Spade.

As an allocator, do you support clients that engage in deal-making with SPs utilizing a VPN?

SPs need to report their locations to us to ensure we can actually get data to where its supposed to be safely and correctly. So inherently, the answer to this question really depends on the data itself and the policies imposed by clients. In almost all cases, we end up tracking real SP locations with tools like Kentik/Cisco Thousand Eyes, and ensuring data is correctly distributed.

DataCap Allocation Strategy

In this section, you will explain your client DataCap allocation strategy.

Keep in mind the program principle over Limited Trust Over Time. Parties, such as clients, start with a limited amount of trust and power. Additional trust and power need to be earned over time through good-faith execution of their responsibilities and transparency of their actions.

Will you use standardized DataCap allocations to clients?

Yes, standardized

Allocation Tranche Schedule to clients:

First: 100 TiB • Second: 200 TiB • Third: 400 TiB • Fourth: 1000 TiB • Max per client overall: 20 PiB (major edge cases only)

Will you use programmatic or software based allocations?

Yes, standardized and software based

What tooling will you use to construct messages and send allocations to clients?

We’d love to use existing tooling or a fork of it.

Describe the process for granting additional DataCap to previously verified clients.

We plan to use comparable tooling to the SA bot. We need something more stable that works with our platform as well.

Tooling & Bookkeeping

This program relies on many software tools in order to function. The Filecoin Foundation and PL have invested in many different elements of this end-to-end process, and will continue to make those tools open-sourced. Our goal is to increase adoption, and we will balance customization with efficiency.

This section will cover the various UX/UI tools for your pathway. You should think high-level (GitHub repo architecture) as well as tactical (specific bots and API endoints).

Describe in as much detail as possible the tools used for: • client discoverability & applications • due diligence & investigation • bookkeeping • on-chain message construction • client deal-making behavior • tracking overall allocator health • dispute discussion & resolution • community updates & comms

Address all the tools & software platforms in your process. 1. Client discoverability and applications: filplus.storage + GitHub issues 2. Due diligence and investigation: customer signup flow from our platform + tracking in public issues 3. Bookkeeping: public GitHub issues with historicity + internal tools for tracking client actions + publicly published user-defined data onboarding and content policy  4. Client deal-making behavior: deals oracle with public dashboards + datacapstats.io  5. Tracking allocator health: datacapstats.io + custom internal facing tooling on DataCap balances and predictive modeling on future DataCap utilization  6. Dispute discussion & resolution: separate GitHub repo with issues for allocator disputes + Fil+ WG defined process for meta-level disputes + customer email inbox with internally defined DRI as escalation point for dispute resolution 6. Community updates & comms: website pages and documentation, blog posts, all cross linked in Filecoin Slack + #fil-plus channel + discussions in notary governance repo and coming to governance calls when relevant

Will you use open-source tooling from the Fil+ team?

We plan to use as much tooling from the Fil+ as makes sense and is possible. This includes - intake forms from filplus.storage, GitHub repos and tooling available alongside them. We do need our own tooling for users of our platform, but that will be built to fit into the system with minimal friction for users.

Where will you keep your records for bookkeeping? How will you maintain transparency in your allocation decisions?

GitHub when possible. All client info relevant to content policy and data distribution will be published to IPFS as well, and be made available to SPs and anyone else in the community. In the case of private info, i.e., KYC records or payment info for off-chain payments, we will support an escalation path to get this information directly from our team.

Risk Mitigation, Auditing, Compliance

This framework ensures the responsible allocation of DataCap by conducting regular audits, enforcing strict compliance checks, and requiring allocators to maintain transparency and engage with the community. This approach safeguards the ecosystem, deters misuse, and upholds the commitment to a fair and accountable storage marketplace.

In addition to setting their own rules, each notary allocator will be responsible for managing compliance within their own pathway. You will need to audit your own clients, manage interventions (such as removing DataCap from clients and keeping records), and respond to disputes.

Describe your proposed compliance check mechanisms for your own clients.

Deal distribution will be compliant by definition, thanks to our tools. Outside this, we plan to test retrievability through our own instance of the retrieval bot, and we will be publishing open dashboards for all our clients and SPs so anyone can audit onboarding behavior and we can start to build tools around outlier detection. We don’t anticipate a large volume of clients to our tools, so we plan to work closely with each one and ensure we have all the information we need to have confidence in any next steps. We do already run a dedicated instance of the retrieval bot, and plan to continue doing this.

Describe your process for handling disputes. Highlight response times, transparency, and accountability mechanisms.

Disputes from the ecosystem/external to the allocator will be the highest priority, handled through making a dedicated POC from our team available within 24h to handle the escalation and provide all necessary information to work towards resolution. Disputes within the allocator - i.e., against a client, will result in data onboarding being paused with immediate effect until the situation can be resolved, and by leveraging tools like unverified deals or more retrievability tests to get clients back into a compliant status. The hardest cases will definitely be for data that we have confidence that turns out to be wrong, i.e, specific info about a client or an SP. Handling this will be manual, so we can ensure each one gets the attention it needs to be resolved successfully.

Detail how you will announce updates to tooling, pathway guidelines, parameters, and process alterations.

Appropriate GitHub repos, website/doc pages, and Filecoin Slack fil-plus channel.

How long will you allow the community to provide feedback before implementing changes?

2-4 weeks. We plan to leverage the existing frameworks in the Fil+ ecosystem (gov calls, discussions, issues, etc.) + Issues on our own repos to enable useful feedback collection.

Regarding security, how will you structure and secure the on-chain notary address? If you will utilize a multisig, how will it be structured? Who will have administrative & signatory rights?

Initially, the address is generated from a hardware wallet (Ledger Nano X). We plan to have 1 DRI (me), 1 fallback, and 1 secondary fallback. In the future, we would like to switch over to a multisig once our team spinoff plan is finalized. At this point, we expect 7 or so members on the multisig, with a threshold of 2 signers.

Will you deploy smart contracts for program or policy procedures? If so, how will you track and fund them?

We are working towards progressive decentralization of data onboarding to the Filecoin network. As part of this, we expect to move portions of our stack into contracts. It is inevitable that the DataCap allocation management will eventually also move towards automation and smart contracts. We don’t have any short term plans or progress to share at this stage, but will keep everyone informed!

Monetization

While the Filecoin Foundation and PL will continue to make investments into developing the program and open-sourcing tools, we are also striving to expand and encourage high levels of service and professionalism through these new Notary Allocator pathways. These pathways require increasingly complex tooling and auditing platforms, and we understand that Notaries (and the teams and organizations responsible) are making investments into building effective systems.

It is reasonable for teams building services in this marketplace to include monetization structures. Our primary guiding principles in this regard are transparency and equity. We require these monetization pathways to be clear, consistent, and auditable.

Outline your monetization models for the services you provide as a notary allocator pathway.

For the first phase - we have no specific staking/collateral based models. We plan to blocklist entities that game our systems and track this publicly. We plan to charge clients for storage, passing most of the $ to SPs. Some of the $ will be used to fund our services, including the allocator pathway.

Describe your organization's structure, such as the legal entity and other business & market ventures.

We are in the process of spinning out of Protocol Labs. We plan to set up a single LLC entity in the US.

Where will accounting for fees be maintained?

N/A with current plans.

If you've received DataCap allocation privileges before, please link to prior notary applications.

I have served on the Fil+ team for over 3 years, and so even though I have not directly allocated DataCap, I’ve played a key role in enabling every single other notary in some capacity.

How are you connected to the Filecoin ecosystem? Describe your (or your organization's) Filecoin relationships, investments, or ownership.

Our team consists of several individuals that have played significant roles in the development of data onboarding software in the Filecoin network. Currently, this includes working on boost, Spade, and related software like the retrieval-bot, deals oracle, and data prep tools. Our team includes Filecoin ecosystem experts, developers and at least 1 active SP operation.

How are you estimating your client demand and pathway usage? Do you have existing clients and an onboarding funnel?

We currently have 2-3 clients with a long term line of sight to a total of 20 PiBs. We expect to actively engage in further outreach and have set an internal target of onboarding net new 6-7 PiB of unique data in 2024 (>50 PiB with 10x replication).