NateWebb03 / FilTestRepo

A test repository for allocator application automation
1 stars 0 forks source link

Test app 997 #997

Open NateWebb03 opened 5 months ago

NateWebb03 commented 5 months ago

Notary Allocator Pathway Name:

Enterprise Data Pathway

Organization:

Data Preservation Labs

Allocator's On-chain addresss:

f1v24knjbqv5p6qrmfjj5xmlaoddzqnon2oxkzkyq

Country of Operation:

United States

Region(s) of operation:

Africa ,Asia minus GCR,Greater China,Europe,Japan,Oceania,North America,South America

Type of allocator: What is your overall diligence process? Automated (programmatic), Market-based, or Manual (human-in-the-loop at some phase). Initial allocations to these pathways will be capped.

Manual

Amount of DataCap Requested for allocator for 12 months:

Our estimate comes from the following information and assumptions: In 2023, the E-Fil+ LDN pathway onboarded 54 PiBs of data. https://datacapstats.io/ However, the onboarding slowed over the past 6 months and there was a problem with notary diversification in the multisig to enable applications to be signed outside of Asia. This problem will go away in the new pathway. Another concern from many potential clients was that number of copies required. Enterprise clients do not want to store 4+ copies, more like 2-3 and in the future this will also be possible and could attract more clients to the pathway. Therefore, across 12 months we are estimating the same amount as last year, 50 PiB, plus an additional 50 PiBs of potential clients who will now engage in the new pathway. Total of 100 PiBs.

Is your allocator providing a unique, new, or diverse pathway to DataCap? How does this allocator differentiate itself from other applicants, new or existing?

Our allocator is providing a pathway for any Enterprise (private/encrypted) dataset use case. This pathway follows the lead of what was built as part of the E-Fil+ LDN pathway that existed prior to 2024. The core of the pathway includes KYC and KYB compliance checks, data sampling and SP Verification as part of data owner verification and due diligence.

As a member in the Filecoin Community, I acknowledge that I must adhere to the Community Code of Conduct, as well other End User License Agreements for accessing various tools and services, such as GitHub and Slack. Additionally, I will adhere to all local & regional laws & regulations that may relate to my role as a business partner, organization, notary, or other operating entity. * You can read the Filecoin Code of Conduct here: https://github.com/filecoin-project/community/blob/master/CODE_OF_CONDUCT.md

Acknowledgment: Acknowledge

Cient Diligence Section:

This section pertains to client diligence processes.

Who are your target clients?

Enterprise Data Clients

Describe in as much detail as possible how you will perform due diligence on clients.

Because the data coming from an Enterprise client will be private and encrypted, we will focus our diligence efforts on confirming: the GitHub applicant user (KYC) the Data Owner business (KYB) the Dataset (contents and size) We will manually vet all client applicants and data owners upfront to confirm who they are, what data they are onboarding, how they will prepare the data and which SPs will be involved in onboarding copies of the data. One main assumption is that a storage provider will do most of the work for the actual data owners and act as the ‘client’ in the application process. Application Clients will be required to apply using the following GitHub application form: (LINK SOON) which contains questions related to the data owner, client role, data preparation, financing, dataset details, and storage provider distribution plan. All responses will be reviewed and assessed versus the Fil+ guidelines for open data storage. Specifically: Dataset - validation of data to be onboarded. Because the data is private encrypted data, the client is required to show a manual demonstration of the data (share raw data or screen share) with allocator team Data Owner - confirmation of connection to business requesting onboarding Client - confirmation of applicant and connection to dataset Data Preparation - who is preparing, what tool(s) are used Retrievability - what are the requirements of the Data Owner in terms of data retrieval Distribution of Onboarding across entities and geopolitical locations The application responses will be made public in the GitHub repo and all communication about applications between client and allocator team will be made in comments of the application. GitHub ID Due diligence will also involve confirming usage associated with the client GitHub ID to enable applicants to add a layer of trust and ultimately utilize one GitHub ID and build a GitHub ID reputation as a good actor over time. New User Check The first checks to be completed on each application by the allocator: Is this a completely new GitHub ID? (less than 2 months old) This is the first time this GitHub ID has applied for DataCap in this or other pathways? If yes to either, applicants will have a lowered maximum DataCap allowance. Client Check (KYC) Because the contents of the dataset are not publicly retrievable, we require all clients to complete a know your customer, (KYC), check to confirm themself as a human user. The current process we offer to clients is via a third party app from togggle.io and is explained in detail below in #13. Clients can also choose another method to prove the identity of the applicant (which must first be vetted by the allocator team and made public for transparency). Additionally in the future we would like to consider small scale automation using quantifiable diligence metrics such as GitHub ID KYC and history, staking Business Check (KYB) Because the contents of the dataset are not publicly retrievable, all dataset owner businesses must complete a know your business, KYB, check to confirm the business exists, is legit, and someone from the data owner team (business) has approved onboarding of the dataset. We currently offer two options for KYB check: We offer a third party KYB service partnership with Synaps.io ($100), that will review business and data owner documentation (details in answer #18 below)

Or they can plan a virtual meeting with the client, data owner and member of the allocator team to review: the dataset confirm ownership (proof of employment, employer signoff, sharing the business license) and validate storage of the data by the client/applicant is approved and a contract is in place If requested by a client, we will utilize various non-disclosure agreements to collect required information on clients and data-owners while maintaining their privacy.

Please specify how many questions you'll ask, and provide a brief overview of the questions.

As mentioned above, we will manually vet all client applicants upfront to confirm who they are, who the data owner is, what data they are onboarding, how they will prepare the data and which SPs will be involved in onboarding copies of the data. Our application has 23 questions and will be the same template as the current LDN application. See link: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/new/choose

Will you use a 3rd-party "Know your client" (KYC) service?

Yes, all applicants will be presented with the option to complete a free third party KYC check. We have integrated with a third party KYC app, Togggle.io, into our application form. Togggle is compatible across over 190 countries to validate identification (KYC). See more here: https://www.togggle.io/

Togggle provides a solution to the problem of validating a unique human user behind each GitHub ID without exposing user information. With our solution design, users are asked to validate their ID and livelihood (KYC) but their submitted ID information is then encrypted, stored in a decentralized manner across servers, and never shared in public in our GitHub repo. Also, members of the allocator team do not have direct access to the client information. The only list made public is a list of GitHub IDs that have passed the KYC check and that list can be found here: https://filplus.storage/api/get-kyc-users Once a user is verified, their GitHub account receives a ‘KYC verification’ label that we will use as a layer of trust on the account. To date, we have invested $9000 in this integration and KYC has been completed by 75 users. The KYC check cost is currently covered by the Allocator team. However, in the future we may transition to charging for new checks ($3-5 each). We are also open to using other KYC providers to support client use cases. Clients may submit ideas or products they are willing to test/pay for and the allocator team will review.

Can any client apply to your pathway, or will you be closed to only your own internal clients? (eg: bizdev or self-referral)

Any client can apply to our pathway and they can discover and learn more about us on filplus.storage. We also hope to have links and marketing on other sites, such as filecoin docs

How do you plan to track the rate at which DataCap is being distributed to your clients?

Datacapstats.io will be connected to our GitHub repo and will track all DataCap distribution information in real time as well as monitoring chain messages from our notary address We are also creating new dashboard specs to help showcase the health of our allocator pathway, such as better snapshot metrics about number of clients approved, Time to DataCap, bot health, and more.

Data Diligence

This section will cover the types of data that you expect to notarize.

As a reminder: The Filecoin Plus program defines quality data is all content that meets local regulatory requirements AND • the data owner wants to see on the network, including private/encrypted data • or is open and retrievable • or demonstrates proof of concept or utility of the network, such as efforts to improve onboarding

As an operating entity in the Filecoin Community, you are required to follow all local & regional regulations relating to any data, digital and otherwise. This may include PII and data deletion requirements, as well as the storing, transmitting, or accessing of data.

Acknowledgement: Acknowledge

What type(s) of data would be applicable for your pathway?

Private Non-Profit/Social Impact,Private Commercial/Enterprise

How will you verify a client's data ownership? Will you use 3rd-party KYB (know your business) service to verify enterprise clients?

We will facilitate two checks for applicable types of private datasets Clients will be asked to complete a third party business (KYB) check. They can complete a KYB check using a third party integration we have already established with https://efilplus.synaps.me/signup ($100 per check). This option has been in use within Fil+ for one year and over 20 clients have attempted KYB, 10 have successfully completed the check. We’ve invested $3000 for this integration to date. If this option doesn’t work, clients can suggest other KYB third party apps. As an allocator, we are willing to vet and consider approving the use of other third parties meeting our due diligence requirements. Or they can plan a virtual meeting with the client, data owner and member of the allocator team to review: the dataset confirm ownership (proof of employment, employer signoff, sharing the business license) and validate storage of the data by the client/applicant is approved and a contract is in place If requested by a client, we will utilize various non-disclosure agreements to collect required information on clients and data-owners while maintaining their privacy.

How will you ensure the data meets local & regional legal requirements?

In the client application we will have a question asking the applicant to confirm they are legally able to represent and store the data in question. This would include asking clients to attest that they are familiar with local & regional requirements that would apply to themselves and any SPs they intend to transact with

What types of data preparation will you support or require?

There is no specific or single data prep tool required. The expectation is that data is properly packed, indexed and retrievable. We will promote usage of data prep tooling built by Protocol Labs teams and network partners and encourage clients to utilize these. Examples: Singularity, web3.storage. If a data preparer is not using a known tool, they can describe the preparation process being used fully in their application, and it will be reviewed upon subsequent allocation checks to validate the tool being used is meeting expectations.

What tools or methodology will you use to sample and verify the data aligns with your pathway?

Our pathway allows any type of private dataset to be onboarded. However, because we cannot view the data, we ask the client to submit information about the dataset and business upfront during the diligence process. Additionally we will ask clients to confirm that the dataset does not include any offensive or illegal content. Examples include: Sexually explicit content Images of child sexual abuse Footage of real or simulated violence, criminal activity or accidents from video clips, games or films Content that advocates the doing of a terrorist act Content instructing or promoting crime or violence Content promoting racism and hate speech

Data Distribution

This section covers deal-making and data distribution.

As a reminder, the Filecoin Plus program currently defines distributed onboarding as multiple physical locations AND multiple storage provider entities to serve client requirements.

Recommended Minimum: 3 locations, 4 to 5 storage providers, 5 copies

How many replicas will you require to meet programmatic requirements for distribution?

2+

What geographic or regional distribution will you require?

Current Fil+ guidelines call for three locations. If clients only have two replicas, we will ask for clients to include at least two physical locations, each in a separate geopolitical region. We ask the clients in the application to list their SP partners and will check for two in different geopolitical regions. If not, we will ask the client to update their application with more information about their storage plan or until guidelines are met. if 3, then 3 locations, if 4, than 4, etc.

How many Storage Provider owner/operators will you require to meet programmatic requirements for distribution?

2+

Do you require equal percentage distribution for your clients to their chosen SPs? Will you require preliminary SP distribution plans from the client before allocating any DataCap?

Yes, clients will need to manage SP distribution plans and ensure distribution stays equal (if only 2) and also within the following guidelines: One Storage provider miner ID cannot store more than one copy Storage provider owner/operator should not be storing duplicate data for more than 20%. Clients are required to submit SPs upfront. If client plans are different than the original guidelines, they will need to clearly map distribution plans upfront. All information is collected in our application process and stored in GitHub.

What tooling will you use to verify client deal-making distribution?

We will use the CID Checker bot tool developed by the Protocol Labs team. Link to main repo: https://github.com/data-preservation-programs/filplus-checker-assets/tree/main CID checker bot reviews on-chain information and looks at: Storage Provider distribution Deal data replication Deal data shared with other clients The CID bot is part of the larger AC bot (Aggregate and Compliance Bot) and will automatically run and check all applications on a weekly basis. We will follow guidance from AC Bot on decision making about approving or denying subsequent allocations.

How will clients meet SP distribution requirements?

Our allocator pathway prioritizes clients presenting information and making clear and provable claims regarding their plan for distributed storage across multiple storage provider owner operator entities and locations to ensure compliance with the Fil+ guidelines. In order to enable client success with this process, we will be marketing vetted SPs through a marketplace tool (GitHub page soon) we will create where Storage Providers are able to complete KYC/KYB upfront confirming who they are (entity), miner IDs and locations and after, only the SP miner ID and location information will be available to clients to search and match with SPs that fit their requirements. Initially onboarding and vetting SPs will be a manual review process completed by the team. However, we are also investigating the use and cost of network monitoring tooling that would provide additional information about SP IP locations and could be automated to check and validate locations. If a client does not intend to use SPs from the vetted SP marketplace, or a vetted Protocol Labs network tool (example:SPADE) then they will be required to provide additional KYB on the SPs they will use to onboard data, in order to get additional allocations approved. Examples include: Business license, proof of datacenter address

As an allocator, do you support clients that engage in deal-making with SPs utilizing a VPN?

Utilization of VPN is an acceptable practice. However, information about SP entities and locations distribution will be required regardless of VPN usage.

DataCap Allocation Strategy

In this section, you will explain your client DataCap allocation strategy.

Keep in mind the program principle over Limited Trust Over Time. Parties, such as clients, start with a limited amount of trust and power. Additional trust and power need to be earned over time through good-faith execution of their responsibilities and transparency of their actions.

Will you use standardized DataCap allocations to clients?

Yes, standardized

Allocation Tranche Schedule to clients:

Each application will have the GitHub ID assessed to confirm if they are a new GitHub ID (less than 2 months old) or first time user to the allocator. If so they will follow first allocation schedule below: First Time User Allocation Schedule:  Did you complete the third party KYC check or another form of KYC? If yes, then they become eligible to receive up to 50 TiBs of DataCap If no, they are not eligible for DataCap in this pathway.

For users utilizing a GitHub ID older than 2 months and have successfully onboarded public open datasets in the LDN pathway (before 2024)

Trusted User Allocation Schedule: If a user has successfully onboarded a dataset using the First time allocation schedule OR they are designated a trusted GitHub ID user: AND have completed the third party KYC check or other form of KYC, then they become eligible to apply for up to 5 PiBs of DataCap *Note: if first time applicants apply for multiple applications at the same time, only after a completion of one, will the count be included and increased allocation sizes become available. The allocation schedule for trusted users is:                                           1st allocation         5% 2nd allocation       15% 3rd allocation        30% 4th allocation        50%

After successful onboarding as a trusted GitHub ID, users then become eligible to apply for 5PiB+ as needed to meet their demand.

Will you use programmatic or software based allocations?

Yes, standardized and software based

What tooling will you use to construct messages and send allocations to clients?

We will use existing notary registry tooling at https://filplus.fil.org/#/

Describe the process for granting additional DataCap to previously verified clients.

When clients use up > 75% of the prior DataCap allocation, a request for additional DataCap in the form of the next tranche is automatically kicked off (via the subsequent allocation bot'). We will set an SLA (Service Level Agreement) to keep up with allocation review and comment on bot messages within 3 days. This could change depending on the demand and number of applications received. Two other things to note about granting DataCap: We will set an expiration date on allocated DataCap of 3 months. There is already built into the allocation bot a stale check that will close applications after 14 days of being idle. That bot will continue to be in effect however, clients can comment before 14 days to keep open or, if closed, request for closed applications to be reopened as needed. However, from an allocation date, we will measure 3 months time and if the allocation has not been used (open or closed status), the application will be closed and remaining DataCap removed. The expectation when the full amount of DataCap is allocated is that the client has completely finished onboarding their dataset and replicas. If a client closes the application before this point, they will be questioned as to why and GitHub ID flagged for future use If a client abandons the application and becomes non responsive, their GitHub ID will be flagged for future use Checks can also be requested by the allocator team to confirm completion of Dataset storage across all replica sites.

Tooling & Bookkeeping

This program relies on many software tools in order to function. The Filecoin Foundation and PL have invested in many different elements of this end-to-end process, and will continue to make those tools open-sourced. Our goal is to increase adoption, and we will balance customization with efficiency.

This section will cover the various UX/UI tools for your pathway. You should think high-level (GitHub repo architecture) as well as tactical (specific bots and API endoints).

Describe in as much detail as possible the tools used for: • client discoverability & applications • due diligence & investigation • bookkeeping • on-chain message construction • client deal-making behavior • tracking overall allocator health • dispute discussion & resolution • community updates & comms

client discoverability & applications - filplus.storage • due diligence & investigation - notary registry and github • bookkeeping - json and github  • on-chain message construct:  • client deal-making behavior: datacap stats • tracking overall allocator health: datacap stats • dispute discussion & resolution: Google form and zoom/slack • community updates & comms: notary governance call and slack

Will you use open-source tooling from the Fil+ team?

As the team that developed most of the open source tooling used today for pathway, we will continue to utilize tools in this pathway and iterate as necessary.

Where will you keep your records for bookkeeping? How will you maintain transparency in your allocation decisions?

Public: In the GitHub applications, KYC check approvals are automatically linked from Togggle.io database to a list in https://filplus.storage/api/get-kyc-users which is linked to the GitHub application repo. No personal information is shared from Togggle to GitHub. For KYB checks, we will provide manual updates in the comments regarding clients completion of required application due diligence checks. For SP entity and location verification - only miner ID, entity name and location will be shared in comments. Overall, any comments made by the allocator team will not include any personal information such as client names or emails as to not open users up to any potential spamming.

Private: KYC personal information is kept in a third party, Toggle.io, database. A record of the GitHub users that have completed KYC is automatically pulled from Toggle Database via API to https://filplus.storage/api/get-kyc-users , no personal information is shared from Togggle to GitHub. KYB personal and business information is kept in a third party, Synaps.io, database. The Allocator team has access to login to a dashboard and confirm completion of the KYB check. In the future we may setup an automatic API call similar to the KYC process as to keep all information private and only a message regarding completion passed to GitHub. For video conference call due diligence checks with a client and or data owner, we will keepa digital a record of the call and who participated with any key notes in a document available to members of the allocator team only. This information will be stored in a team drive for up to 2 years. If requested by a community member to prove that KYC, KYB, SP Verification, or Video due client/business diligence completion took place, we will proactively provide KYC/KYB and allocator drive folder logins to the Filecoin Foundation team so they can conduct audits as needed.

Risk Mitigation, Auditing, Compliance

This framework ensures the responsible allocation of DataCap by conducting regular audits, enforcing strict compliance checks, and requiring allocators to maintain transparency and engage with the community. This approach safeguards the ecosystem, deters misuse, and upholds the commitment to a fair and accountable storage marketplace.

In addition to setting their own rules, each notary allocator will be responsible for managing compliance within their own pathway. You will need to audit your own clients, manage interventions (such as removing DataCap from clients and keeping records), and respond to disputes.

Describe your proposed compliance check mechanisms for your own clients.

We’ll track and audit DataCap distribution by looking at usage across our dashboards. We’ll be looking for anomalies in onboarding rates or other trends that might signal abusive behavior. Regarding new client tolerance, we’ve set up new client processes to limit new applicants and new GitHub IDs on DataCap, especially on their first application. We’ve also set up a KYC process to allow clients to add a layer of trust and access to more DataCap. After a successful onboarding, clients using the same GitHub user ID will become eligible for more DataCap on subsequent applications.

Describe your process for handling disputes. Highlight response times, transparency, and accountability mechanisms.

For disputes between our allocator and client, hereby termed appeal(s), we will source the appeals using the Enterprise Data Allocator Appeals Form for all our clients where they can submit an appeal and someone on the team will address it with a 14 day SLA. We would like to respect the privacy of the client and do not plan to host a public resolution process. For disputes raised by community members/non-clients about our allocation approach and strategy, we will comply with the public dispute tracker that is being built by the Filecoin Foundation Governance team. We can commit an SLA for such disputes to be 21 days.

Detail how you will announce updates to tooling, pathway guidelines, parameters, and process alterations.

We’ll transparently present updates to tooling, guidelines, parameters and process alterations before they happen. We’ll document all proposed changes in an issue in our repo and share in designated slack channels and also bring to community governance calls as needed to present and receive feedback before any changes are made.

How long will you allow the community to provide feedback before implementing changes?

We’ll allow any feedback for 1-2 weeks prior to implementing proposed changes. Community members can submit comments on the proposed issues in our repo. Depending on the weight and impact of a proposed change on the community, we will review all comments and feedback and decide if a soft consensus is needed and request community members to weigh in.

Regarding security, how will you structure and secure the on-chain notary address? If you will utilize a multisig, how will it be structured? Who will have administrative & signatory rights?

We will utilize a multisig, 2 people from the entity will hold a ledger and will be signers for each allocation

Will you deploy smart contracts for program or policy procedures? If so, how will you track and fund them?

Not at this time, perhaps with future iterations we will introduce this feature.

Monetization

While the Filecoin Foundation and PL will continue to make investments into developing the program and open-sourcing tools, we are also striving to expand and encourage high levels of service and professionalism through these new Notary Allocator pathways. These pathways require increasingly complex tooling and auditing platforms, and we understand that Notaries (and the teams and organizations responsible) are making investments into building effective systems.

It is reasonable for teams building services in this marketplace to include monetization structures. Our primary guiding principles in this regard are transparency and equity. We require these monetization pathways to be clear, consistent, and auditable.

Outline your monetization models for the services you provide as a notary allocator pathway.

Currently there is no plan to monetize our allocator. We are funded in the near term, but the strategy and monetization plan could change in the future.

Describe your organization's structure, such as the legal entity and other business & market ventures.

Delaware Corporation Filecoin Data Preservation Foundation 1111B S Governors Ave #7426 Dover, DE 19904

Where will accounting for fees be maintained?

N/A

If you've received DataCap allocation privileges before, please link to prior notary applications.

N/A

How are you connected to the Filecoin ecosystem? Describe your (or your organization's) Filecoin relationships, investments, or ownership.

Members of our entity were previously members of the Data Programs team in Protocol Labs. We have institutional experience as the governance team for both the operational and tooling resources to run the LDN and E-FIl+ pathways.

How are you estimating your client demand and pathway usage? Do you have existing clients and an onboarding funnel?

As mentioned in question 7, we estimate a need for100 PiB of DataCap in 2024. Our estimate comes from the following assumptions and information: In 2023, the E-Fil+ LDN pathway onboarded 54 PiBs of data. https://datacapstats.io/ However, the onboarding slowed over the past 6 months and there was a problem with notary diversification in the multisig to enable applications to be signed outside of Asia. This problem will go away in the new pathway. Another concern from many potential clients was that number of copies required. Enterprise clients do not want to store 4+ copies, more like 2-3 and in the future this will also be possible and could attract more clients to the pathway.