NateWebb03 / FilTestRepo

A test repository for allocator application automation
1 stars 0 forks source link

Test app 1068 #1070

Open NateWebb03 opened 5 months ago

NateWebb03 commented 5 months ago

Notary Allocator Pathway Name:

Storify Data Fortress

Organization:

Storify LLC.

Allocator's On-chain addresss:

f13a6ov3nrxllvyqkduwczpyashgu3luivodwyvgq

Country of Operation:

United States of America

Region(s) of operation:

North America

Type of allocator: What is your overall diligence process? Automated (programmatic), Market-based, or Manual (human-in-the-loop at some phase). Initial allocations to these pathways will be capped.

Manual

Amount of DataCap Requested for allocator for 12 months:

100PiB

Is your allocator providing a unique, new, or diverse pathway to DataCap? How does this allocator differentiate itself from other applicants, new or existing?

We are going to adopt a manual way to allocate dataCaps, just like LDN process but with tighter & scientific rules to reduce abuse or disputes. The main differences lie in the following aspects:

  1. All the clients are required to go through KYC to ensure the authentication of the applicant, including but not limited to utility bills(water, electricity, and cable) or business licenses, etc.,
  2. Increase the amount of sample data. On the basis of data ownership, compliance, and data size proof, every client has to provide 5% of the total requested dataCap as sample data.
  3. Increase the number of multi-sign allocators or auto-assign random allocators to the application. In this way, it will reduce self-dealing and colluding.
  4. Every client is allowed to submit ONE application at a time.
  5. Set a different allocation range to specific clients based on clear standards.
  6. Set punishment & rewards system for allocators, SPs, and clients.

As a member in the Filecoin Community, I acknowledge that I must adhere to the Community Code of Conduct, as well other End User License Agreements for accessing various tools and services, such as GitHub and Slack. Additionally, I will adhere to all local & regional laws & regulations that may relate to my role as a business partner, organization, notary, or other operating entity. * You can read the Filecoin Code of Conduct here: https://github.com/filecoin-project/community/blob/master/CODE_OF_CONDUCT.md

Acknowledgment: Acknowledge

Cient Diligence Section:

This section pertains to client diligence processes.

Who are your target clients?

Small-scale developers or data owners,Enterprise Data Clients,Individuals learning about Filecoin,Other (specified above)

Describe in as much detail as possible how you will perform due diligence on clients.

We plan to apply standards based on different applicants. The specific rules are described below: If you are an old client with a good history. Provide the previous link only, the allocator takes a look of the application and puts the address on the greenlist. No due diligence is required. If you are an old client with a bad history. An improvement scheme is needed, and the allocator will evaluate the feasibility. If the plan is reasonable, a small amount is granted for a test drive. If you are a new client as an individual. The Github age must be 6 months old or longer. 10TiB is about to be granted as a start. If you are a new client as an organization. KYC is a must. Then a general due diligence process must be implemented. The standard due diligence process includes: client authentication, data ownership, and SP plan. It will be elaborated in the relevant section below.

Please specify how many questions you'll ask, and provide a brief overview of the questions.

I am going to ask 15 questions. Except for the questions in the standard GitHub template. I am going to ask rounds of questions for additional information, in a bid to prevent any fraud jeopardizing the community. The questions can not be exhaustive including the following aspects: Client identity information: organization name, business address, business license or KYB screenshot. Data validity: data ownership, data size, data type, and content, sample data SP qualifications: how to select SPs SP distribution plan: SP list, SP location, SP organization SP management plan: how to ensure SPs selected follow the rules as promised. Otherwise, any recovery plan? For specific questions I am going to ask, please refer to the link here. https://docs.google.com/spreadsheets/d/101GJe6tJes29-yJohF9coO3SDxzEJ54H/edit?usp=sharing&ouid=102234962091100491283&rtpof=true&sd=true

Will you use a 3rd-party "Know your client" (KYC) service?

For organizations, clients must complete KYC offered by 3rd parties like Toggle or Qichacha. They need to submit the results screenshot or link in GitHub as an identity proof. For individuals, people can choose to upload an ID card or driver's license if they don’t mind. If they think it’s too sensitive, they can share social accounts like Twitter, Ticktok, etc., the more followers, the better. But legal identity proof is preferred, the amount of dataCap will be greater than that of clients who do not share the legal proofs. Note: for any organization or individual who steals someone else’s identity, it’s strictly prohibited. All clients will agree to the relevant terms about the Disclaimer before submitting the application. Otherwise, she/he will bear all legal responsibilities yourself.

Can any client apply to your pathway, or will you be closed to only your own internal clients? (eg: bizdev or self-referral)

Anyone who meets the requirements for applying dataCap is welcomed. In the future, incentivized strategies may be introduced for bringing more new clients to store dataCap on Filecoin. If it’s permitted, some marketing events could be held to educate friends in relevant sectors to store dataCap to Filecoin from other chain or traditional web2 providers.

How do you plan to track the rate at which DataCap is being distributed to your clients?

We are going to use open-source tooling mainly SA bot to track the usage rate of DC at client side. Besides, allocator signing info details collected from Notary Registry or Filecoin chain can be analyzed and displayed in a new UI in real-time in the future. This kind of information should be shared and accessed by all the people, so all allocators are under supervision of the public. That increases transparency and trust.

Data Diligence

This section will cover the types of data that you expect to notarize.

As a reminder: The Filecoin Plus program defines quality data is all content that meets local regulatory requirements AND • the data owner wants to see on the network, including private/encrypted data • or is open and retrievable • or demonstrates proof of concept or utility of the network, such as efforts to improve onboarding

As an operating entity in the Filecoin Community, you are required to follow all local & regional regulations relating to any data, digital and otherwise. This may include PII and data deletion requirements, as well as the storing, transmitting, or accessing of data.

Acknowledgement: Acknowledge

What type(s) of data would be applicable for your pathway?

Public Open Dataset (Research/Non-Profit),Public Open Commercial/Enterprise

How will you verify a client's data ownership? Will you use 3rd-party KYB (know your business) service to verify enterprise clients?

First, we are going to collect basic info from GitHub application. Then KYB&KYC service like Diro or Toggle is recommended, the client is allowed to choose one from those 3rd party providers. Once the identity of the client is verified. We are going to pay special attention to its business scope. And then confirm whether the relevant data content&type described is generated from business operations or somewhere else. Second, all the clients are required to agree to some privacy policy which reads the client will bear all legal responsibilities for any misappropriation of other people’s data, the allocators will not be jointly and severally liable. Besides, we will conduct a retrieval test for the dataset after the first tranche, to further ensure the dataset stored is as described in the application.

How will you ensure the data meets local & regional legal requirements?

We have rich experience in the blockchain sector and always follow the updates about compliance. We are always doing business strictly abiding by the rules and regulations issued by the government. All the staff are well educated and have legal awareness above average people. Furthermore, we have dedicated legal & compliance staff for consulting service and dispute resolution. We will strictly follow the guidelines from Fil+ governance team as always. We will not take action on any vague activities that may breach laws and regulations before consulting our legal staff. Special attention will be paid to sensitive dataset including but not limited to government, intellectual property, private info & img etc; according to different categories, our legal & compliance colleague will work out special terms and conditions for compliance.

What types of data preparation will you support or require?

Clients can use singularity to prepare datasets. We can offer guidelines and tutorials to help them to get started. We require thorough data preparation, including cleaning, standardization, de-identification, anonymization, as well as data enrichment and integration. We support data segmentation processing and provide automated data cleaning tools, data de-identification solutions, and data integration platforms to ensure data quality, security, and compliance with regulatory requirements.

What tools or methodology will you use to sample and verify the data aligns with your pathway?

First, we will apply the toolings of Fil+ governance team. The report from CID bot and retrieval bot will verify whether or not the data stored is aligned with the dataset claimed in application. Manual retrieval is necessary after first tranche. If the data is not as described, client can stop the deal immediately to reduce loss. The SPs should be punished as agreed between client and SP. The SP will be blacklisted since then.

Data Distribution

This section covers deal-making and data distribution.

As a reminder, the Filecoin Plus program currently defines distributed onboarding as multiple physical locations AND multiple storage provider entities to serve client requirements.

Recommended Minimum: 3 locations, 4 to 5 storage providers, 5 copies

How many replicas will you require to meet programmatic requirements for distribution?

5+

What geographic or regional distribution will you require?

3+

How many Storage Provider owner/operators will you require to meet programmatic requirements for distribution?

5+

Do you require equal percentage distribution for your clients to their chosen SPs? Will you require preliminary SP distribution plans from the client before allocating any DataCap?

No, we are going to distribute dataCap to SPs based on its history including reputation, the amount of dataset sealed, location, retrieval rate. The overall rule is : the higher the credit score, the amount of dataCap it will be granted. But a single SP will not take over 20% of the whole deal. Before allocating dataCap to SPs, we plan to use a template to collect necessary information about SP. Based on the results, rate the true capability of dataset sealing. SP information template is attached here. SP information includes: SP ID, location, organization, previous history, retrieval success rate etc. According to this information provided, the client will rate SP‘s capability and respect the will of SP, and allocate a reasonable percentage of deal to a specific SP. Grade A - 20% B - 10% C - 5% https://docs.google.com/spreadsheets/d/1ht-iWcxzThR9W3iRYrynX7roTO99aYuR/edit?usp=sharing&ouid=102234962091100491283&rtpof=true&sd=true

What tooling will you use to verify client deal-making distribution?

We are going to use https://datacapstats.io , retrieval bot and CID checker to track everything about allocation. With CID checker and retrieval bot, the statistics of allocation is computed. A/C bot will set the bar based on the collected information. If the client meets all standards, A/C bot will automatically allocate subsequent dataCap to the client. If not, A/C bot will send a warning explaining why the subsequent dataCap is not granted by A/C bot.

How will clients meet SP distribution requirements?

Clients are required to choose qualified SPs to work with. Clients have the right to choose their own SPs if they want. Otherwise, if their SP distribution plan is not as expected or lack of relevant resources. We will share a list of reputable SPs or help them to identify the correct SPs. A high-quality SP must be with a good history.

As an allocator, do you support clients that engage in deal-making with SPs utilizing a VPN?

In theory, VPN use should be banned because a few SPs use VPN to fake their location, which jeopardizes the core principles of diversity and decentralization. But the truth is blockchain activities are on the sanction list of several countries. Like China, people can not access to foreign websites, let alone blockchain activities. We may use tooling like Tracert to locate the true location to avoid cheating behaviors to ensure VPN is used correctly.

DataCap Allocation Strategy

In this section, you will explain your client DataCap allocation strategy.

Keep in mind the program principle over Limited Trust Over Time. Parties, such as clients, start with a limited amount of trust and power. Additional trust and power need to be earned over time through good-faith execution of their responsibilities and transparency of their actions.

Will you use standardized DataCap allocations to clients?

Yes, standardized

Allocation Tranche Schedule to clients:

For new clients or old client with a history • First: 25TiB • Second:50TiB • Third:100TiB • Fourth: 200TiB • Max per client overall:500TiB For old clients with a good reputation and perfect history: • First: 5% of total requested dataCap • Second: 10% of total requested dataCap • Third: 35% of total requested dataCap • Fourth: 50% of total requested dataCap • Max per client overall: 5PiB

Will you use programmatic or software based allocations?

No, manually calculated & determined

What tooling will you use to construct messages and send allocations to clients?

Notary registry is adopted to send messages and allocations to clients.

Describe the process for granting additional DataCap to previously verified clients.

SA bot will be used together with A/C bot, when the remaining dataCap from the previous tranche is less than 10%, SA bot will trigger the request for the next tranche. If the previous tranche meets requirements, then A/C bot will allocate the dataCap directly without the need of manual signing.

Tooling & Bookkeeping

This program relies on many software tools in order to function. The Filecoin Foundation and PL have invested in many different elements of this end-to-end process, and will continue to make those tools open-sourced. Our goal is to increase adoption, and we will balance customization with efficiency.

This section will cover the various UX/UI tools for your pathway. You should think high-level (GitHub repo architecture) as well as tactical (specific bots and API endoints).

Describe in as much detail as possible the tools used for: • client discoverability & applications • due diligence & investigation • bookkeeping • on-chain message construction • client deal-making behavior • tracking overall allocator health • dispute discussion & resolution • community updates & comms

client discoverability & applications: Clients apply for dataCap by GitHub like before.  due diligence & investigation: the client will submit basic information in GitHub application template. Using one 3rd party KYC provider, verify the identity of individuals and organizations. Plus, we will prepare a template of questions to further verify the compliance of the client and distribution plan.  Bookkeeping: https://datacapstats.io/notaries this page is advised to be improved and accessed to the public. For now, the relevant information is not comprehensive and updated real-time. More fields should be added, like the reason why signing the application. Signing history should be shared among community members. The application number and link should be added. If anyone has disagreement with signing action, a dispute proposal should be allowed to be submitted. Before the website is optimized and completed, an online Google form is recommended as an expedient measure at first. on-chain message construction: Ledger is used to auth and construct messages.  client deal-making behavior: SA bot, CID checker, retrieval bot, and A/C bot will be adopted together. SA bot should warn the client to use the dataCap in a reasonable schedule to reduce dataCap abuse and waste. CID checker and retrieval bot should be responsible for reporting the overall performance of storage. Based on the results from bots, A/C bot should take actions against metrics. Messages shall be sent to warn the client to make adjustments in time. tracking overall allocator health: punishment and rewards strategy should be created for supervising the allocator's actions. For example: If an allocator follows the rules and no dispute is submitted against her/him, he will be awarded more dataCap like 5 PiB or higher; if not, he will be punished for his non-compliance for forfeiting the granted dataCap according different levels of non-compliance. dispute discussion & resolution: decentralized voting tooling should be used for the resolution of any dispute and disagreement. If someone has a disagreement or something abnormal to report, he should submit a proposal about this issue, including issue description, proof, poll start & end time, actions to be taken, etc., all community members have the right to  vote and finalize the dispute. In this way, all people can participate and transparency and fairness are increased. community updates & comms: Slack is also used as a primary way to communicate. Because most of the people are using Slack for now. But the responsiveness is not enough. Based on experience in the blockchain space, we are going to create a Discord server for Filecoin for better communication. Discord is more organized and easy to find useful information.

Will you use open-source tooling from the Fil+ team?

Apart from the tools mentioned above, Ledger will be used to sign the application. We choose the manual pathway to allocate dataCap. But we will seek intelligent ways to automize the way of allocating dataCap as much as possible. Including 3rd party KYC provider, random signing allocators, optimized allocator tracking UI in https://datacapstats.io/, auto-allocate subsequent dataCap with A/C bot, also with Decentralized voting tool to handle the disputes. Those measures are all we can think of right now. We will improve the process while implementing.

Where will you keep your records for bookkeeping? How will you maintain transparency in your allocation decisions?

I will use an online google form for our own bookkeeping. The form will include the following items: application link, application number, address, tranche number(avoid signing too often), granted dataCap amount, the reason why signing this application, note(unexpected circumstances)。 This form will open access to everyone. If someone questions why signing an application, she/he can take a look of the decision-making process. If more information is needed, we will leave our Slack No or Discord handle for contacts.

Risk Mitigation, Auditing, Compliance

This framework ensures the responsible allocation of DataCap by conducting regular audits, enforcing strict compliance checks, and requiring allocators to maintain transparency and engage with the community. This approach safeguards the ecosystem, deters misuse, and upholds the commitment to a fair and accountable storage marketplace.

In addition to setting their own rules, each notary allocator will be responsible for managing compliance within their own pathway. You will need to audit your own clients, manage interventions (such as removing DataCap from clients and keeping records), and respond to disputes.

Describe your proposed compliance check mechanisms for your own clients.

First, we will record our allocation decisions in google form tracker. This will help ourselves to track the allocation clearly. Before signing the next allocation, we will check in the googleform to avoid any repeated signing or bad history etc., all decisions will be made based on sufficient proof. Regular check-ins: every two weeks, we will do a check-in for all applications we signed this month. Run command to view the CID report and retrieval report to confirm any non-compliance behavior or not. If the number of SPs, SP location, replication rate or retrieval rate are not as expected, we will leave a comment and suspend signing the next tranche.

Describe your process for handling disputes. Highlight response times, transparency, and accountability mechanisms.

Currently, the dispute resolution tracker is not scientific and the process is not accepted by all the community members. Disputes are not settled well because everyone holds a different opinion. To ensure fairness and transparency, the decentralized voting tool should be used for dispute resolution. Everyone has a right to vote if you are interested. Based on guidelines and principles from allocation strategy, if anyone has a dispute against the other party, he can submit a poll including application number, dispute description, proof, solutions. No matter what kind of dispute, this method will settle disputes quickly and convince every community member.
In this way, voting reduces collusion and increases transparency. That makes everybody happy.

Detail how you will announce updates to tooling, pathway guidelines, parameters, and process alterations.

It depends on the severity of the changes. If those are slight changes like tooling updates, parameters, schedule modifications, software updates, we will announce those details in a regular allocator conference and also share it in Slack channel. If those are critical changes, including but limited to allocation process update, adding staking mechanism or automizing the due diligence process etc., we will draft a proposal on github and collect feedback from communities. Set up a period for the proposal, then evaluate the feasibility. In the first half of 2024, we will make announcements on Slack. But we will grow Discord community as well. We will shift from Slack to Discord community gradually.

How long will you allow the community to provide feedback before implementing changes?

Before implementing, we plan to save a month for feedback. The duration will ensure the community can discuss and provide feedback sufficiently. The basic rules should be in pinned messages in relevant channels. The governance team or volunteer ambassadors will be assigned to specific channels as moderators. Bots will be introduced to manage the chats in real time and reduce the extreme speech. The frequently asked questions will be prepared as a list and shared in the moderators channel. And the list will be updated on a regular basis like a week or bi-week. Filecoin community values different voices from different backgrounds.

Regarding security, how will you structure and secure the on-chain notary address? If you will utilize a multisig, how will it be structured? Who will have administrative & signatory rights?

The address will be generated by ledger. Signing is used but we consider to increase the number of allocators to 3~4 for signing one tranche. If possible, auto-assigning allocators will be effective to reduce fraud and colluding.

Will you deploy smart contracts for program or policy procedures? If so, how will you track and fund them?

Not for now. But if we work out a feasible auto allocation plan later, no human intervened, smart contracts will be a perfect way to solve all the problems.

Monetization

While the Filecoin Foundation and PL will continue to make investments into developing the program and open-sourcing tools, we are also striving to expand and encourage high levels of service and professionalism through these new Notary Allocator pathways. These pathways require increasingly complex tooling and auditing platforms, and we understand that Notaries (and the teams and organizations responsible) are making investments into building effective systems.

It is reasonable for teams building services in this marketplace to include monetization structures. Our primary guiding principles in this regard are transparency and equity. We require these monetization pathways to be clear, consistent, and auditable.

Outline your monetization models for the services you provide as a notary allocator pathway.

Apart from atomizing the allocating process as much as possible. Preparing a punishment & rewards system is necessary. We don’t have a detailed plan or formula to calculate the amount of collateral. Consensus should be reached before implementing. It would be great if staking and slashing collateral strategy is prepared for allocators, clients and SPs. It’s a great way to reduce fraud once the cheating cost is increased.

Describe your organization's structure, such as the legal entity and other business & market ventures.

Storify was founded in California, USA. Its entity number is 202251417769. It is mainly engaged in the research and development of distributed storage software and is committed to improving storage efficiency and retrieval, as well as data compression, indexing and query optimization technologies. In addition, improving security of distributed storage software is the other focus where we put focus on. Shizhi Gu has been engaged in distributed software research and development and hardware architecture for many years, and has accumulated rich experience in the industry.

Where will accounting for fees be maintained?

Staking & slashing mechanism is introduced, then the accounting for fees can be maintained on chain. It’s clear and transparent.

If you've received DataCap allocation privileges before, please link to prior notary applications.

N/A

How are you connected to the Filecoin ecosystem? Describe your (or your organization's) Filecoin relationships, investments, or ownership.

We participated in Filecoin ecosystem since 2020, we ran a few small nodes at first. Then we participated in slingshot program. And we stored 20PiB dataset on Filecoin, the SP IDs are f0870558 f01106668 f01315096 f01518369 f01889668 f02131801 f02131855 f02131881. We are very optimistic about Filecoin, and will keep investing and want to grow and expand with Filecoin system.

How are you estimating your client demand and pathway usage? Do you have existing clients and an onboarding funnel?

As an experienced SP, we have a lot of friends and acquaintances who have strong demand for data storage. The dataset is generated on a weekly basis. Apart from this, we are going to attract more web2 giants joining Filecoin for storage.