aragon / nest

A grants program to support the development of the ecosystem
https://aragon.org/project/grants
Creative Commons Zero v1.0 Universal
140 stars 78 forks source link

Data Marketplace App (XY) #197

Closed brresnic closed 4 years ago

brresnic commented 4 years ago

XY

About

Organizations increasingly rely on ML models, trained on data, to create value. Many organizations utilize, request, sell, create, transform, and arbitrage data on a massive scale today. And data-related business activity is likely to increase dramatically in the near future.

Aragon's existing and forthcoming apps provide a substantial portion of the infrastructure necessary to create effective "data marketplaces". Building off of this infrastructure to natively support two-sided markets for datasets is a rich and untapped opportunity.

I haven't fully developed this proposal. The solution space of possible data marketplace structures and use cases is huge. But I would be interested in gauging Nest's level of interest and thoughts.

Fundamentally, the basic idea of this proposal would be to create two Aragon apps which enable DAOs to:

  1. Request datasets, as well as to request and aggregate individual data records
  2. Sell encrypted datasets in a format that is easy for the average Data Scientist to consume

Project name

XY

Project description

PROJECT GOALS, framed as "User Need Statements":

A data scientist can

  1. Access sample data records from an Aragon DAO's for-sale datasets
  2. Pay an Aragon DAO for an access token, which gives them the right to train ML models on a certain number of encrypted records from a given dataset, for a particular length of time.
  3. Using a few lines of code at the top of a python notebook, import that encrypted dataset, and utilize it to train a tensorflow model.

An Aragon DAO can

  1. Monetize their datasets, without giving access to the underlying data records
  2. Request data of a particular format and quantity, and provide a bounty for that data. Specify whether that data can come from multiple providers. Specify whether the data needs to be unencrypted, or whether it can be furnished in the form of a token which provides access to another DAO's encrypted dataset. Customize and utilize a generic form UI / API, through which data providers can submit data. Plug custom scripts into a framework which performs a check on data quality/integrity, and then uploads data (or dataset access tokens) to a decentralized repository which the DAO controls.
  3. Submit a dispute to the Aragon court if a data provider has claimed a bounty based on data which doesn't meet predetermined quality thresholds.
SUPPORTING LITERATURE:

Machine learning with data privacy guarantees: https://arxiv.org/pdf/1810.08130.pdf https://arxiv.org/pdf/1812.02428.pdf https://github.com/tensorflow/privacy

Federated learning (enabling model training outside of access-token holder's environment): https://github.com/tensorflow/federated https://arxiv.org/pdf/1902.01046.pdf https://towardsdatascience.com/the-new-dawn-of-ai-federated-learning-8ccd9ed7fc3a

Potential decentralized compute for training models: https://sonm.com/solutions/machine-learning/

Potential architecture for training models with a centralized dependency (AWS lambda): (Aragon Agent -> authorized address nonce -> queue of data/model pairs to load/execute) https://medium.com/@mike.p.moritz/running-tensorflow-on-aws-lambda-using-serverless-5acf20e00033 https://hack.aragon.org/docs/guides-use-agent https://silamoney.com/2019/07/08/using-aws-lambda-sqs-with-web3/

Proposed architectures for decentralized data marketplaces: https://arxiv.org/pdf/1811.11462.pdf https://arxiv.org/pdf/1906.01799.pdf https://arxiv.org/pdf/1812.09966.pdf

Code repo and/or website URL (if any)

NA


Team

Name of the project lead

Benjamin Resnick

Email address of the project lead (whoever is filling out this application)

benjamin_resnick@alumni.brown.edu

List each other member that is/would be working on this project (LinkedIn/GitHub profiles are preferred but not required)

https://www.linkedin.com/in/benjamin-resnick-7b602b7b/ http://benjaminresnick.com/#intro https://github.com/brresnic

Is your team already present on the Aragon Forum? If Yes, what are your usernames?

no


Progress

How far along are you?

I haven't written a single line of code.


Idea Problem

Why did you pick this idea to work on? Do you have domain expertise in this area?

My first experience working on a data marketplace was at IBM, five years ago. Since then, I've been working on data-centric technologies, including a graph database-as-a-service, augmented reality data visualization tool, and an AI-powered automated reading tutor for k-3rd grade students.

I'm very impressed with Aragon, and understand that there's a huge solution space of possible data marketplace structures, both within DAOs and traditional businesses, as well as an absolutely gigantic market opportunity.

I would love to start a conversation on whether there's a potential project here.


Market

What are the alternatives or competitors to this project?

https://numer.ai/ https://medium.com/catalyst-crypto/the-enigma-data-marketplace-is-live-e3ef01f6a6d4 https://encrypgen.com/ https://datum.org/


yeqbfgxjiq commented 4 years ago

Hey @brresnic thanks for the application.

Regarding natively support two-sided markets for datasets, do you mean something like the Ocean Protocol, Open Mined, or Erasure?

Regarding the feasibility of this proposal, one does not simply "make a marketplace." It's hard. While Aragon DAOs make token creation and exchange a lot more accessible to teams, creating a marketplace for data would require additional work:

All in all I don't feel like this is a good project for Nest, but happy to be convinced otherwise :)

Also, I appreciate you reaching out to start a project before you've started writing code! Most people never even bother to validate their ideas. Would be curious to hear more of your ideas, esp regarding the market opportunity. What's the key thing that DAOs enable that was not possible before?

brresnic commented 4 years ago

I substantially agree with your assessment. Thank you for the thoughtful response, and the competitive analysis. I had heard about some of these projects, but wasn’t up-to-date on their recent progress. Ocean Protocol's product and roadmap look particularly strong!

I do think there may be potential to build out a sample use case and corresponding blog post, demonstrating the value prop of using Aragon's governance and payroll capabilities in conjunction with a data marketplace (possibly with Ocean Protocol).

This sample use case would likely make the most sense in the second half of 2020. I think the value of using Aragon in conjunction with a data marketplace will crystalize once Aragon's "payroll" app is live. And the Ocean Protocol’s upcoming work to "bring the data to the compute" would also make this much more compelling (when considered in conjunction with their differentiated dataset monetization capabilities).

The core value-add that, to my understanding, DAOs brings to the table (within the context of a data marketplace ecosystem), is the ability to easily couple a variety of governance and incentive structures with private datasets.

With regards to incentives, DAOs have the potential to make it dramatically easier for multiple parties to collaborate on the creation, curation, and monetization of datasets. For example, consider a hypothetical recently graduated PHD student, who has created a useful proprietary dataset in his spare time. In the near future, this budding entrepreneur could spin up an Aragon DAO in minutes to manage the ownership rights to his data. He could provide fractional ownership of the dataset to a colleague, in exchange for that colleague cleaning and transforming the data. Then, with a platform like the Ocean Protocol, they could monetize their dataset, without giving away the underlying data records.

With regards to governance, large organizations (universities and Fortune 500 companies) have complex decision making processes that determine who can use their data, and under what conditions. I've personally experienced the pain and complexity of navigating a large bureaucracy in order to access to a proprietary dataset. This complexity is only compounded when a dataset is aggregated from multiple parties, each with distinct rights. I think DAOs, simply by digitizing the rights of stakeholders across multiple organizations, can provide a much more efficient mechanism for working through data governance processes, and access requests. In my view, once DAOs become more widely accepted, they are poised to enable new forms of data-sharing agreements, which previously weren't feasible simply because of the associated legal cost.

I imagine that, eventually, it will prove ideal to programmatically couple the governance and incentive structures enabled through flexible DAOs with data management/monetization capabilities provided by platforms such as the Ocean Protocol (perhaps through Aragon Agent). But in the next 12 months, it seems like the use cases described above could be clearly demonstrated within the constraints of existing incentive structures and architectures, without writing any integration code.

yeqbfgxjiq commented 4 years ago

build out a sample use case and corresponding blog post, demonstrating the value prop of using Aragon's governance and payroll capabilities in conjunction with a data marketplace (possibly with Ocean Protocol)

Very exciting! Software is eating the world and AI is eating software. Democratizing access to data and making it easier for researchers to collaborate is the bottleneck for AI democratization. Very much looking forward to hearing more of your thoughts on this. Also happy to collaborate or review a draft if you feel so inspired :)

The core value-add that, to my understanding, DAOs brings to the table (within the context of a data marketplace ecosystem), is the ability to easily couple a variety of governance and incentive structures with private datasets.

Would this require an application to store and encrypt/decrypt data that Aragon DAOs hold? If so, might something like Espresso / Aragon Drive help? Then, would an application like Aragon Fundraising make it possible for DAO's to programmatically sell access to that data in an automated way? Or maybe use the Token Request for datasets that are just to be shared with a few teams on an as needed basis?

new forms of data-sharing agreements, which previously weren't feasible simply because of the associated legal cost.

Would this require an app that lets an Aragon DAO whitelist a set of addresses which can be interacted with?

I imagine that, eventually, it will prove ideal to programmatically couple the governance and incentive structures enabled through flexible DAOs with data management/monetization capabilities provided by platforms such as the Ocean Protocol (perhaps through Aragon Agent). But in the next 12 months, it seems like the use cases described above could be clearly demonstrated within the constraints of existing incentive structures and architectures, without writing any integration code.

Totally! Numerai is one of the first successful dApps that actually used a token for something useful and got traction. This was due to the unique mechanisms provided and the fact that they solved a real problem for data scientists. I'm aware that the problem you're describing (costs and bureaucracy around data access) is real, but is Aragon the solution they are looking for? If it was easy to deploy an Aragon DAO to manage data sets (say in 30-60min), do you know lots of other data scientists who might be into that?

yeqbfgxjiq commented 4 years ago

Closing this as it's not a good fit for the Nest program right now, but please feel free to reach out if you want to discuss the ideas more :)