Data privacy enforced ML

huihzhao commented 4 months ago

Main Objectives/Goals:

To invite developers and researchers to contribute to the development and implementation of Zero-Knowledge Machine Learning (zkML) technologies for secure and privacy-preserving verification of inference and data provenance. zkML combines ZKPs with machine learning to allow verifiable and privacy-preserving ML computations. This is particularly crucial in scenarios where data sensitivity and privacy are paramount, such as medical diagnoses, financial analysis, and personal data processing. zkML ensures that ML models can be applied to sensitive data without exposing the data itself to third parties, while still allowing the results to be verified for correctness and integrity.

Challenge Description:

We seek proposals that address the following areas:

Development of zkML Frameworks:
- Creation of open-source zkML frameworks and libraries.
- Implementation of zkML techniques in existing ML platforms.
- Optimization of zkML algorithms for efficiency and scalability.
Applications in Sensitive Industries:
- Prototyping zkML solutions for healthcare, including secure medical diagnosis and personalized treatment recommendations.
- Developing zkML applications for financial fraud detection and secure financial transactions.
- Exploring zkML use cases in privacy-preserving data analytics.
Proof of Concept and Pilot Projects:
- Development of PoC projects demonstrating zkML applications, such as an on-chain verifiable ML trading bot, like RockyBot.
- Enhancing blockchain protocols with zkML, for example, Lyra Finance options protocol AMM with intelligent features or Astraly’s transparent AI-based reputation system.
- Proving complex ML-driven strategies on-chain, like Giza’s work with Yearn to verify yield strategies using zkML.
Research and Innovation:
- Fundamental research on zkML techniques, including new ZKP methods and their integration with ML models.
- Innovative applications of zkML in emerging fields such as Internet of Things (IoT) and smart cities.

Mirror-Tang commented 4 months ago

Interested in taking on this job!

huihzhao commented 4 months ago

@Mirror-Tang great to see it. please put your proposal here and we will go through with team and also provide feedbacks, let`s keep building!

LorenzoTomaz commented 3 months ago

Interested in taking this job! I'm part of AE Studio (https://ae.studio/), and our skunkworks division is building an open-source ZKML framework called Dohko, based on the protocol Virgo. Our goal is to provide a production-ready implementation that leverages the awesome features of Virgo and Orion, e.g. linear time prover complexity, friendly to parallel prover implementations... As of today, we are capable of easily dealing with small neural networks and more common ML algorithms, allowing proofs of computation for Onnx graphs, and we also provide extensions to generate proofs for Numpy computations, using just Python (with C++ bindings) and the dev-friendly zero-dependency library we are building for Dohko. On our roadmap, we plan to create cross-platform parallel implementations for these protocols so we can distribute the proof generation process among multiple machines, reducing the computation overhead and hardware requirements to compute proofs for large circuits, as well as implementing a DSL to allow the creation of zk scripts like Circom but using SOTA protocols like Virgo++ (with Orion) and Libra (with the multilinear KZG IOP). Our goal is eventually to be able to build a proof marketplace, connecting idle nodes and users requesting proofs for verifiable Ml use cases. I'm the lead protocol developer for the team and I hope we can engage in more discussions about pushing forward a collaboration.

LorenzoTomaz commented 3 months ago

For the BNB Q3 hackathon, our goal is to implement a version of the Virgo++ and Libra protocols in Python that can generate proofs for arbitrary computations, using fix point arithmetic to represent the structure of operations and that can create proofs for basic Onnx Ops such Gemm and CNN. We focus on implementing an educational version that can be expanded into a production version later. Since we are mostly focusing on Libra/Virgo for Dohko our challenge will be to implement Virgo++/Libra from scratch as well as on-chain validation

huihzhao commented 3 months ago

Polyverse AI - https://www.polyverse.network/ Polyverse AI is revolutionizing the field of artificial intelligence with its cutting-edge "Zero-Trust AI Data Engine" and AI Data Marketplace (ADM). Our unique platform integrates the latest in Web3 technology and advanced privacy protocols, including Fully Homomorphic Encryption (FHE), FHEML, and Trusted Execution Environments (TEE). These innovations enable secure and scalable AI data operations across various industries without compromising data privacy. By leveraging blockchain technology for data integrity and decentralized data management, Polyverse AI ensures that all data transactions are traceable and secure. Our platform is designed to empower AI and machine learning developers, tech enterprises, academic researchers, and public sector organizations to utilize vast amounts of data safely and efficiently.

How does it works？

Today, AI and large language models (LLMs) require vast volumes of training data, much like engines need fuel. However, currently 90% of global data remain untouched by AI/LLM, hindered by privacy and confidentiality concerns. At Polyverse AI, our mission is to unlock the transformative potential of AI data. We are pioneering the planet-scale "Zero-Trust AI Data Engine" and establishing a dynamic AI Data Marketplace (ADM), both empowered by the innovations of Web3 and privacy technologies such as Fully Homomorphic Encryption (FHE), FHEML, and Trusted Execution Environments (TEE). These technologies enable AI and LLMs to process data while fully encrypted, ensuring data confidentiality. Our groundbreaking data engine supports LLMs, AI agents, and applications across diverse sectors such as healthcare, finance, robotics, commerce, and real-world applications. We are proud to be supported by leading institutions and companies, including Kleiner Perkins, NVIDIA, Amazon Web Services, Columbia University, The New York Times, and Entrepreneur.

Together, we are merging the benefits of privacy-preserving confidentiality with decentralized storage solutions like IPFS and BNB Greenfield, revolutionizing how data is securely stored and accessed in the digital age.

LorenzoTomaz commented 3 months ago

Awesome project! But I was kinda hoping for feedback regarding the BNB Q3 hackathon. Dora Hacks pointed out this issue as the start point for entering the wishlist for the AI track. @huihzhao, can you provide more details about feedback and how to join the wishlist?

Mirror-Tang commented 2 months ago

Interested in taking this job! I'm part of AE Studio (https://ae.studio/), and our skunkworks division is building an open-source ZKML framework called Dohko, based on the protocol Virgo. Our goal is to provide a production-ready implementation that leverages the awesome features of Virgo and Orion, e.g. linear time prover complexity, friendly to parallel prover implementations... As of today, we are capable of easily dealing with small neural networks and more common ML algorithms, allowing proofs of computation for Onnx graphs, and we also provide extensions to generate proofs for Numpy computations, using just Python (with C++ bindings) and the dev-friendly zero-dependency library we are building for Dohko. On our roadmap, we plan to create cross-platform parallel implementations for these protocols so we can distribute the proof generation process among multiple machines, reducing the computation overhead and hardware requirements to compute proofs for large circuits, as well as implementing a DSL to allow the creation of zk scripts like Circom but using SOTA protocols like Virgo++ (with Orion) and Libra (with the multilinear KZG IOP). Our goal is eventually to be able to build a proof marketplace, connecting idle nodes and users requesting proofs for verifiable Ml use cases. I'm the lead protocol developer for the team and I hope we can engage in more discussions about pushing forward a collaboration.

I see that this is an open-source project. Where can I view your code repository?

LorenzoTomaz commented 2 months ago

Here's our codebase: https://github.com/agencyenterprise/zkgraph-bnb-hack. We still need to implement a lot of features to go into production, but we cover most of the features for building a ZKML framework POC (proof of concept). We designed the Libra protocol from scratch during the hackathon and covered most of the features outlined in the base paper, but we focused on creating a framework that deals with public inputs and models with public weights only. It allows the proof of the validity of onyx graphs for ops such as GEMM, ReLu, and CNN.

Mirror-Tang commented 2 months ago

From a hackathon perspective, it looks promising as a minimum viable model. From a funding perspective, here are some personal suggestions:

Just building the minimum model is not enough; you need to describe your grand final goal.
What benefits does it bring to the BNB Chain, and what advantages does BNB Chain have in running the ZKML model?
Milestones - set appropriate milestones based on your goals for tracking progress.

LorenzoTomaz commented 2 months ago

Hey @mirror-tang, here's a draft of our milestones for the 0k project. Let me know what you think:

Q3 2024: Launch 0k testnet with 15 most used ONNX ops:

Implement a high-performant C++ version of the protocol

Q4 2024: Integrate with major AI frameworks (TensorFlow, PyTorch)

Release SDK for easy integration
Aim to have a version that can run popular models like Bert, Yolo, Mobilenet, Whisper, and other classic models

Q1 2025: Roll out GPU support for proof generation

Target 100x performance improvement over the CPU-only version

Q2 2025: Secure partnership with a major DePIN project

Implement 0k in their infrastructure for a real-world use case

Q3 2025: Launch 0k mainnet on BNB Chain

Deploy on-chain verifiers and a token-based reward system for provers
Aim for 1000 TPM for classic and industry-standard models, such as Yolo and more.

Q4 2025: Achieve cross-chain AI verification capabilities

Partner with at least two other major blockchain networks

Q1 2026: Hit performance milestone of 10,000 TPM for all supported model types

This should make us competitive with centralized ML inference platforms

Q2 2026: Launch 0k DAO for decentralized protocol governance

Our end goal with 0k is to create a robust, decentralized infrastructure for verifiable AI computations. We aim to make ML model execution and verification as trustless and transparent as current blockchain transactions but with the performance needed for real-world AI applications.

Key objectives:

Develop a scalable zkML framework that can handle complex ML models (beyond just MLPs and basic neural nets)
Achieve verification speeds comparable to or exceeding centralized ML platforms
Implement privacy-preserving features to enable secure computation of sensitive data
Build a decentralized network of prover nodes to distribute the computational load
Create developer-friendly tools and SDKs to foster adoption and integration in various DApps

By leveraging BNB Chain (and potentially other chains in the future), we're working towards a system where developers can easily deploy AI models that users can trust without needing to understand the underlying ML or ZK tech.

How 0k helps BNB Chain:

Brings novel ZKML use cases to the chain, diversifying its ecosystem and attracting AI/ML developers
Increases on-chain activity through frequent ML model verifications, utilizing BNB Chain's high throughput

How BNB Chain helps 0k:

Provides a scalable, low-cost infrastructure ideal for running frequent ZKML operations
Offers a supportive ecosystem with grants, hackathons, and an existing AI developer community

Ultimately, we want 0k to be the go-to solution for anyone building decentralized apps that require verifiable AI computations

Mirror-Tang commented 2 months ago

I believe you should focus on the usability of your ZKML solution rather than solely emphasizing promotion. It's important to conduct a horizontal comparison with other solutions in the industry. What is your work? What sets it apart? Where does its innovation lie? You should provide specific details on the feasibility of the solution rather than making vague claims like "I'll be a hundred times faster than them and then become a god in three years."

In summary, you should discuss the technical aspects and speak with data and references to your previous work. Additionally, please fill out the application form here. https://forms.monday.com/forms/0469580c0e412266a888526a38b114a0?r=euc1

bnb-chain / community-contributions

Data privacy enforced ML #63

Main Objectives/Goals:

Challenge Description: