filecoin-project / notary-governance

113 stars 55 forks source link

v5 Notary Allocator Application:METAVERSEDATAMINING #1031

Open METAVERSEDATAMINING opened 6 months ago

METAVERSEDATAMINING commented 6 months ago

v5 Notary Allocator Application

To apply to be an allocator, organizations will submit one application for each proposed pathway to DataCap. If you will be designing multiple specific pathways, you will need to submit multiple applications.

Please complete the following steps:

1. Fill out the information below and create a new GitHub Issue

  1. Notary Allocator Pathway Name (This can be your name, or the name of your pathway/program. For example "E-Fil+"):METAVERSEDATAMINING
  2. Organization Name:METAVERSEDATAMINING
  3. On-chain address for Allocator (Provide a NEW unique address. During ratification, you will need to initialize this address on-chain): f1uhdgxklhmone613xjikxrs6zciitf6o5aipapfa
  4. Country of Operation (Where your organization is legally based): Singapore
  5. Region of Operation (What region will you serve?): All regions
  6. Type of Allocator, diligence process: (Automated/programmatic, Market-based, or Manual (human-in-the-loop at some phase): Manual
  7. DataCap requested for allocator for 12 months of activity (This should be an estimate of overall expected activity. Estimate the total amount of DataCap you will be distributing to clients in 12 months, in TiB or PiB):150P

2. Access allocator application (download to save answers)

Click link below to access a Google doc version of the allocator application that can be used to save your answers if you are not prepared to fully submit the application in Step 3. https://docs.google.com/document/d/1-Ze8bo7ZlIJe8qX0YSFNPTka4CMprqoNB1D6V7WJJjo/copy

3. Submit allocation application

Clink link below to access full allocator questionnaire and officially submit your answers: https://airtable.com/appvyE0VHcgpAkt4Z/shrQxaAIsD693e1ns

Note: Sections of your responses WILL BE posted back into the GitHub issue tracking your application. The final section (Additional Disclosures) will NOT be posted to GitHub, and will be maintained by the Filecoin Foundation. Application information for notaries not accepted and ratified in this round will be deleted.

Kevin-FF-USA commented 6 months ago

Hi @METAVERSEDATAMINING

Wanted to let you know this application has been received along with the Airtable detailed answers - the public answers will be posted in a thread below soon. If you have any questions - please let me know.

ghost commented 5 months ago

Basic Information

1. Notary Allocator Pathway Name: METAVERSEDATAMINING

2. Organization: METAVERSEDATAMINING

3. On Chain Address for Allocator: f1uhdgxklhmone613xjikxrs6zciitf6o5aipapfa

4. Country of Operation: Singapore

5. Region(s) of operation: South America, North America, Japan, Oceania, Europe, Greater China, Asia minus GCR, Africa

6. Type of Allocator: Manual

7. DataCap requested for allocator for 12 months of activity: 150P

8. Is your allocator providing a unique, new, or diverse pathway to DataCap? How does this allocator differentiate itself from other applicants, new or existing?: Our review process combines manual analysis and the use of tools to ensure comprehensive, accurate, and flexible reviews. 1.Manual analysis: We will carefully review the materials and data submitted by the clients and conduct a comprehensive analysis and verification. This includes checking the completeness of documents, verifying the accuracy of key information, and ensuring that the provided information and documents meet the requirements. This step requires online tools (such as using business qualification query websites), past experience, and professional knowledge. 2.Open-source tools: To increase the accuracy and efficiency of the review process, we will utilize open-source tools to assist in the review. These tools can help us verify the authenticity of documents, detect potential fraudulent activities, identify possible risk factors, and more. However, we will consider the results of these tools as references and still perform manual confirmation and judgment. KYC Tools: filplus.storage, toggle, Fil+ registration form, official email confirmation, etc. Anti-fraud Tools: ip2location, seon.io Data Retrieval: AC Bot, CID checker Bot 3.Communication and feedback: If any uncertainties or questions arise during the review process, we will proactively communicate with the client to request additional relevant information or seek further clarification. This helps clarify any doubts, reduce misunderstandings, and ensure the accuracy of the final review outcome and client satisfaction.

9. As a member in the Filecoin Community, I acknowledge that I must adhere to the Community Code of Conduct, as well other End User License Agreements for accessing various tools and services, such as GitHub and Slack.: Acknowledge

Client Diligence

10. Who are your target clients?: Enterprise Data Clients, Small-scale developers or data owners, Individuals learning about Filecoin

11. Describe in as much detail as possible how you will perform due diligence on clients. If you are proposing an automated pathway, what diligence mechanism will you use to determine client eligibility?: We will understand from three dimensions: the client, the data, and the retrieval process: 1.Client: Client KYC and verification of client background, which includes but is not limited to personal information or organizational qualifications, business, reputation, etc. 2.Data: We will verify the data nature (public data/private data), data ownership, size, compliance of samples, storage planning, transmission methods, and compliance of the collaborating service providers (SPs). 3.Retrieval: We will use open-source tools to check the compliance of the client's distribution and storage. Additionally, we will download and sample the data to verify its consistency with the client's declarations.

12. Please specify how many questions you’ll ask, and provide a brief overview of the questions.: Questions list:https://1drv.ms/x/s!Aq7akAb4KAtOgSun7If3ODsgCSpw?e=CqyYvs Client:In this section, We will gather information about the client's background, request relevant proofs, and conduct KYC verification using tools like Toggle and Fil+ registration form. Data:In this section, We will gather detailed information about the data content, dataset size, and request proof from the client to verify the claimed data size matches the actual dataset. We will also discuss copyright and legal compliance of the data with the client and request necessary proofs if required. Finally, We will ask the client to provide a detailed storage plan, including dataset preparation, distribution, contact and planning with service providers (SPs), and DC distribution. This ensures that the client is well-prepared and has a planned DC distribution, and allows the reviewers to have a comprehensive understanding. Tools such as datacapstats.io, CID check bot, AC Bot, ip2location, and seon.io will be used to assist in the process. Retrieval:In this section, We need the client to have a detailed understanding of the storage and retrieval guidelines to ensure compliance with storage regulations. Tools such as CID check bot, AC Bot, ip2location, and seon.io will be used to assist.

13. Will you use a 3rd-party Know your client (KYC) service?: Yes, We use reputable 3rd-party system that ensures client eligibility, such as Toggle

14. Can any client apply to your pathway, or will you be closed to only your own internal clients? (eg: bizdev or self-referral):  Any client can apply

15. How do you plan to track the rate at which DataCap is being distributed to your clients?: Regularly verify relevant clients based on the tracking form(https://1drv.ms/x/s!Aq7akAb4KAtOgSkFJwhbehMnL6J1?e=1trVQr), use CID checker for verification, communicate with clients regarding different issues, and download data to verify if it matches the client's declaration.

Data Diligence

16. As an operating entity in the Filecoin Community, you are required to follow all local & regional regulations relating to any data, digital and otherwise. This may include PII and data deletion requirements, as well as the storing, transmit: Acknowledge

17. What type(s) of data would be applicable for your pathway?: Public Open Dataset (Research/Non-Profit), Public Open Commercial/Enterprise, Private Commercial/Enterprise, Private Non-Profit/Social Impact

18. How will you verify a client’s data ownership? Will you use 3rd-party KYB (know your business) service to verify enterprise clients?: This includes publicly available datasets (research/non-profit), public open commercial/enterprise, and private non-profit/social impact datasets that are open for retrieval by anyone. 1.Data Source Citation: Request clients to provide source citations or reference links for public data. This ensures that the data corresponds to publicly accessible data sources, and clients can provide the origin of the data. 2.Data Processing Records: Ask clients to provide detailed explanations of the data processing records or the data acquisition process. Clients can provide information about data acquisition, processing, cleaning, or organization to demonstrate that they have legally processed and transformed the data. 3.Confirmation from Data Preparers: If clients claim to be the data preparers or have an affiliation with the preparers, they can be asked to provide relevant evidence such as contracts, authorization letters, or collaboration agreements with the preparers. 4.Client Business Background Investigation: Conduct background checks to understand the client's business background and related activities. This can include the client's industry expertise, patents, or research achievements. The client's business background associated with the claimed data can provide additional evidence. Private Commercial/Enterprise Datasets: 1.Especially when dealing with PII or sensitive data, verification methods may need to be more cautious and privacy-focused.Data Ownership Contracts: Request clients to provide data ownership contracts or authorization documents to ensure that they have legal ownership or usage rights. 2.Secure Access Controls: Understand and verify the client's security access control measures for the data. This may involve the use of keys, access permission management, encryption techniques, etc., to protect data security and privacy. 3.Privacy Compliance Review: Ensure that clients comply with applicable privacy regulations, such as relevant laws in specific countries/regions. Request clients to provide compliance documents and evidence of privacy policies.

19. How will you ensure the data meets local & regional legal requirements?: 1.Understand the relevant laws and regulations of the countries, regions, and industries involved in storing the data. 2.Ensure understanding of the categorization of the data to be stored (e.g., sensitive data, personally identifiable information, confidential data, etc.). 3.Regularly monitor changes in laws and regulations to ensure compliance with local laws and promptly make necessary adjustments.

20. What types of data preparation will you support or require?: We support the following public open datasets, which are openly accessible to the public, allowing anyone to freely access, use, and share these datasets. We encourage more of such datasets to be stored on the Filecoin network, promoting data transparency, reproducibility, and unleashing their potential for innovation. --Public open datasets (research/non-profit) --Public open datasets (commercial/enterprise) --Private non-profit/social impact --Private commercial/enterprise

21. What tools or methodology will you use to sample and verify the data aligns with your pathway?: In the early stages, we used the official tools for retrieval, download, and data inspection, all of which were done manually. As the workload increased, we created simple scripts to automate the invocation of the official tools. The operator only needs to provide key information about the nodes, and the script will automate the batch retrieval, download, unpacking, and feedback of execution results using the official tools. Finally, manual inspection is performed on the unpacked files, taking into account the information provided during the LDN project application. The manual process can only be conducted on a sampling basis, thus serving as an auxiliary role.

Data Distribution

22. How many replicas will you require to meet programmatic requirements for distribution?: 5+

23. What geographic or regional distribution will you require?: 3+ regions, including unique geopolitical/language coverage, with each replica located in different physical locations, without or minimal use of VPN, and operated by different service provider owners.

24. How many Storage Provider owner/operators will you require to meet programmatic requirements for distribution?: 5+

25. Do you require equal percentage distribution for your clients to their chosen SPs? Will you require preliminary SP distribution plans from the client before allocating any DataCap?: Clients are requested to provide a detailed allocation plan following a standardized template. Before the initial allocation, it is necessary to disclose quantitative details about service providers (miners and percentage distribution) and qualitative details (Service Provider Know Your Business - SP KYB information). The subsequent review will assess the information disclosed during the initial phase.

26. What tooling will you use to verify client deal-making distribution?: The use of open-source tools provided to the community, such as datacapstats.io, AC Bot, CID checker, etc., enables automatic verification and reporting of client transaction allocations before subsequent allocations. 1, use datacapstats.io and CID checker bot to gather information about datacap allocation, replica storage, and CID duplicates. 2, Verify the IP addresses of service providers using filfox to obtain preliminary information about the geographical distribution. 3, Verify the compliance of retrieval ratios and validate the consistency of downloaded data. 

27. How will clients meet SP distribution requirements?: Some support provided to clients to assist in SP discovery & deal making, such as SP contact forums or reputation info.

28. As an allocator, do you support clients that engage in deal-making with SPs utilizing a VPN?: Willing to support clients who work with SPs that utilize VPNs, but requires additional KYB checks and still enforces distribution, even if utilizing VPN 1.Clients should provide honest feedback on whether they use a VPN. 2.Clients should choose SPs that offer stability and KYB checks, including but not limited to the following: --Reasons for using a VPN --Proof of purchase or lease of a genuine data center

DataCap Allocation Strategy

29. Will you use standardized DataCap allocations to clients?: Yes, standardized

30. Allocation Tranche Schedule to clients:: Specific reference Allocation Strategy(https://1drv.ms/x/s!Aq7akAb4KAtOgSPmxNWnTP_hHhxU?e=wdwGLu)

31. Will you use programmatic or software based allocations?: Yes, standardized and software based

32. What tooling will you use to construct messages and send allocations to clients?: Open-sourced tools (UX/UI), such as Notary Registry

33. Describe the process for granting additional DataCap to previously verified clients.: Referring to my Allocation Strategy(https://1drv.ms/x/s!Aq7akAb4KAtOgSPmxNWnTP_hHhxU?e=wdwGLu) and DataCap tranche size calculations(https://github.com/filecoin-project/filecoin-plus-large-datasets?tab=readme-ov-file#datacap-tranche-size-calculations), we will use the following allocation (SA) robots, CID inspection robots, AC BOT, etc., to examine the distribution and data retrieval of clients in order to determine the next batch of allocations.

34. Describe in as much detail as possible the tools used for: • client discoverability & applications • due diligence & investigation • bookkeeping • on-chain message construction • client deal-making behavior • tracking overall allocator health • disput: 1.Client Discovery and Application: Use GitHub as a code version control and collaboration platform to manage code repositories and documents related to community operations, client discovery, and applications. 2.Due Diligence and Investigation: Databases and Data Analysis Tools: Utilize various databases and open-source data analysis tools to organize, analyze, and evaluate the information submitted by clients for due diligence and investigation. 3.Bookkeeping: Use a document management system to store and manage files and records related to bookkeeping, ensuring accuracy and traceability. 4.On-chain Message Construction: Filecoin Official Toolset: Utilize the official tools provided by Filecoin, including Lotus and related APIs, to construct and send messages related to on-chain interactions.client 5.Transaction Behavior: Use data analysis tools to analyze and evaluate client distribution behavior to understand the compliance of their actions. 6.Tracking Allocator's Overall Health: Use dashboards and analytics tools to track and monitor the overall health of the allocator, including application progress, transaction volume, and other relevant indicators. 7.Dispute Discussion and Resolution: Utilize communication and collaboration tools such as Slack, Notion, Microsoft Teams, or similar platforms to facilitate cooperation and communication for dispute discussion and resolution. 8.Community Updates and Communication: Publish community updates and notifications on a blogging platform for allocators and relevant stakeholders. Engage in communication and updates with the community through social media platforms such as official channels on GitHub, Slack, and other similar channels.

Tools and Bookkeeping

35. Will you use open-source tooling from the Fil+ team?: yes,We are currently using these tools: datacapstats.io 、https://filplus.fil.org/#/、 SA Bot、CID checker 、 Retrievability Bot、AC Bot、GitHub repo、Google spreadsheet

36. Where will you keep your records for bookkeeping? How will you maintain transparency in your allocation decisions?: We use Google Spreadsheet to track detailed information ( https://1drv.ms/x/s!Aq7akAb4KAtOgSkFJwhbehMnL6J1?e=GARST5)about allocations and sync it with our GitHub application.

Risk Mitigation, Auditing, Compliance

37. Describe your proposed compliance check mechanisms for your own clients.: We have been using datacapstats.io, CID Checker/Retrievability Bot for auditing purposes. Initially, I will maintain a tolerant attitude towards all clients, but the audit requirements will increase with each allocation batch, requiring clients to strictly adhere to storage metrics.

38. Describe your process for handling disputes. Highlight response times, transparency, and accountability mechanisms.: Maintain a response time within 2 business days for community communication, proposal discussions, message replies, and dispute resolution. Internal dispute resolution process: 1.Listen and understand: First, listen to the other party's perspectives and concerns, ensuring a thorough understanding of their position and the nature of the dispute. Give them enough time and space to express their opinions and grievances. 2.Collect evidence and information: Gather and organize evidence, data, and information related to the dispute. Ensure accurate and reliable evidence for assessment and analysis during the resolution process. 3.Communicate with the other party: Engage in active and constructive communication with the other party to seek consensus and solutions. Provide clear and objective explanations and reasons, clarifying your own position and actions. 4.Seek third-party intervention: If both parties cannot reach a resolution, consider seeking assistance from a dispute resolution team or governance committee. This may involve organizing conference calls, community discussions, or submitting dispute trackers to facilitate communication and understanding. 5.Document and follow up: Maintain detailed records and documentation throughout the dispute resolution process, including communication records, evidence, and decision outcomes. External dispute resolution process : 1.Submit clear problem statements, relevant evidence, and supporting materials. 2.Update feedback and submit additional supporting evidence on the dispute tracker. 3.Engage in active and constructive communication with the other party to seek consensus and solutions. 4.When necessary, request discussions in conference calls and make decisions through majority voting, consensus, or rulings by the dispute resolution team or committee. 5.Maintain a positive attitude towards presenting evidence, impartiality, objectivity, transparency, and enforceable resolutions.

39. Detail how you will announce updates to tooling, pathway guidelines, parameters, and process alterations.: We will use a manual mode and leverage official open-source tools to reduce subjective judgment while retaining manual operations to increase flexibility.

40. How long will you allow the community to provide feedback before implementing changes?: 1.The importance of community feedback is crucial for achieving an open, transparent, and decentralized community structure. Before implementing any changes, I want to ensure that the community is given sufficient time to provide feedback, ensuring broad participation and diverse perspectives. The specific timeframe may vary depending on the complexity and scale of the project, but the key is to ensure that the community is given enough time to understand and evaluate the changes and provide meaningful feedback. 2.We dedicate 6-8 hours per week to our work in the community, and we have always been proactive and timely in providing feedback for various community updates.

41. Regarding security, how will you structure and secure the on-chain notary address? If you will utilize a multisig, how will it be structured? Who will have administrative & signatory rights?: We have always had dedicated personnel to manage the ledger. If there are any personnel changes, we will update them in the application. 

42. Will you deploy smart contracts for program or policy procedures? If so, how will you track and fund them?: None

Monetization

43. Outline your monetization models for the services you provide as a notary allocator pathway.: None

44. Describe your organization's structure, such as the legal entity and other business & market ventures.: We are a Chinese team which has been formed since 2003. And we have been in the software development industry since the C++ epoch, mainly involved in large scale application development,which covers various areas such as Disk Driver, Network Drivers, large scale of P2P transmission, etc. We were invited by a famous SP to participate in the Filecoin Slingshot race in 2020. And after that, we constantly provide a series of underlying technical services in Filecoin area,such as mining, data analysis, cluster optimization and reconstruction, operation and maintenance monitoring and alerting,etc. So far, we have supported over 100 nodes ,and with 1.4EiB power. But with the change of policy in Greater China since 2021, we chose to  establish our new company MetaVerse Data Mining in Singapore and moved our business to a more policy friend region. In addition to the above services and products we mentioned, we are planning to devote more in the Web3.0, join more blockchain projects, and expand our research and service in blockchain security.  And after the launch of FVM, we will focus more on developing smart contracts, NFT and other applications on FVM. Official Website:https://www.mdmlab.io/ Filecoin Data Center: https://open.mdmlab.io/ Pledge Joint-mining platform: https://pledge.mdmlab.io

45. Where will accounting for fees be maintained?: None

Past Experience, Affiliations, Reputation

46. If you've received DataCap allocation privileges before, please link to prior notary applications.: https://github.com/filecoin-project/notary-governance/issues/709

47. How are you connected to the Filecoin ecosystem? Describe your (or your organization's) Filecoin relationships, investments, or ownership.: MetaVerse Data Mining was invited to join the Filecoin project in 2020. As a member of the Filecoin community, our team has been dedicated to community building and network development. These are our nodes: f01877862, f01846877, f01840776, f01846857. To contribute to the development of the Filecoin network, we have developed the following products: Filecoin Data Center: https://open.mdmlab.io/ Pledge Joint Mining Platform: https://pledge.mdmlab.io In 2022, we participated in the FIL+ project and became a V4 notary, further deepening our involvement in the FIL+ community. Our focus is to bring together and unite various roles in the ecosystem, including clients, service providers (SPs), funders, and equipment owners, with the aim of harnessing the power of the Filecoin network and our platform to unleash their maximum potential.

48. How are you estimating your client demand and pathway usage? Do you have existing clients and an onboarding funnel?: 23MF is our client. We referred and assisted them in onboarding the E-FIL+ project in September 2022. They have submitted the applications #1121 and #1122. They still have storage requirements. Regarding the client onboarding process: We will engage with the client through phone meetings, emails, or face-to-face discussions to understand their storage needs, data types, storage duration, and preferred collaboration service providers (SPs). We will provide them with onboarding guidelines (client-onboarding、E-Fil+、Allocation Strategy), ensuring that the client has a clear understanding of the entire process. Through ongoing communication, we will strive to enhance the smooth progress of subsequent work.

galen-mcandrew commented 3 months ago

@METAVERSEDATAMINING

The provided address is invalid: f1uhdgxklhmone613xjikxrs6zciitf6o5aipapfa

You need to initialize the address by receiving some FIL and sending a message, so that the valid f1 address has a create time. Then you can add it to an f2 msig.

METAVERSEDATAMINING commented 3 months ago

@galen-mcandrew The f1 address in the application is incorrect. Please update it to f1uhdgxklhmone6l3xjikxrs6zciitf6o5aipapfa. I have initialized the address and also added it to an f2 msig.

galen-mcandrew commented 3 months ago

Datacap Request for Allocator

Address

f2l2hzyxmr6wjoc7il4yxkq2vkgdvxwigsmc6apta

Datacap Allocated

5PiB

filplus-bot commented 3 months ago

The request has been signed by a new Root Key Holder

Message sent to Filecoin Network

bafy2bzacecsw2qu2nmtuphodyo6t75csfyrkn776bmwbudd45eztl5qirqxx4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecsw2qu2nmtuphodyo6t75csfyrkn776bmwbudd45eztl5qirqxx4

METAVERSEDATAMINING commented 3 months ago

@galen-mcandrew I need to check my information allocation_bookkeeping again, but it hasn't been converted to JSON yet. Please refer to the following link : https://github.com/METAVERSEDATAMINING/Allocator-Pathway-MDM

METAVERSEDATAMINING commented 2 months ago

@galen-mcandrew @Kevin-FF-USA @willscott My allocator information is not here "https://github.com/filecoin-project/Allocator-Registry/tree/main/Allocators" Please help to confirme it. Additionally, here is my bookkeeping link: https://github.com/METAVERSEDATAMINING/Allocator-Pathway-MDM.

Kevin-FF-USA commented 2 months ago

Hi @METAVERSEDATAMINING ,

Thanks for the tag in. I'm also not seeing your JSON setup in the Allocator Registry.
Wanted to give you a heads up that I'm looking into this now

METAVERSEDATAMINING commented 1 month ago

Hi @Kevin-FF-USA I'm wondering if there's been any headway on this matter. Is there anything I can provide to expedite the resolution process? Please let me know.

@galen-mcandrew @Kevin-FF-USA @willscott My allocator information is not here "https://github.com/filecoin-project/Allocator-Registry/tree/main/Allocators" Please help to confirme it. Additionally, here is my bookkeeping link: https://github.com/METAVERSEDATAMINING/Allocator-Pathway-MDM.