Open Grant Proposal: DataUnion stack Filecoin integration
Name of Project: DataUnion
Proposal Category:app-dev
Proposer:Robin Lehmann - @w1kke
Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT and APACHE2 licenses?: "Yes"
Project Description
DataUnion is building an ecosystem for data unions via a data-union-as-a-service approach. These data unions are going to create data that is machine learning ready and insights will be created on top of this data. Each data union represents a vertical for a data type or a specific use case e.g. restaurant data, agricultural data, sensor data or computer vision data. Currently we are storing all the data of these data unions in centralised servers. But we want to change that. The data producers/contributors have to be able to stay in control of their data but it has to stay available 24/7. So these data providers would have to rent their own servers or leave their phones/computers online all the time. This does not make sense as this would drive the price of their data up by a lot as their personal costs of providing access to the data would increase. So we are looking for a permanent storage solution that is decentralised, permanently available and cost efficient.
When we sell the data then we copy it to a machine where an algorithm training or insight application is then working with the data. To give people previews and work with the data we are providing a preview version from a hot storage where these previews are stored.
This is why we want to create personal drives for data contributors on Filecoin. The solution that we are building from this grant is going to create these wallets and integrate them into our platform stack that we are providing to the data unions. This means that the users upload their data, the previews are going to be created to work and showcase the data, and then the data is going to be stored on the users storage on Filecoin. It will be encrypted to make sure that the data cannot be stolen from Filecoin's storage system.
Value
The long term benefits are the amount of data to be stored will grow with each data union that we are going to establish. This is why it makes this a great chance to get right at this time when we are still in the earlier stages with our 12 initial data unions. We will then be able to store all the data decentralised, permanently and ready to be sold at any time.
If we mess things up with the cryptographical part of the solution then we risk that the data could be leaked. This is one of the most important aspects that we are working out in this grant.
The main risk is that the encryption and decryption part using the users wallet on the one side and then the ability to use the data that was stored for sales will not work properly.
Deliverables
The final deliverable for this project will be a version of our backend that works with the users data on a Filecoin storage but being able to still get a preview of the data and use it for data sales via authorization of the user - mostly algorithm training and insight generation. We will create a PoC where this is possible and show how this works.
Development Roadmap
Please break up your development work into a clear set of milestones. This section needs to be very detailed (will vary on the project, but aim for around 2 pages for this section).
For each milestone, please describe:
The software functionality that we can expect after the completion of each milestone. This should be detailed enough that it can be used to ensure that the software meets the specification you outlined in the Deliverables.
How many people will be working on each milestone and their roles
The amount of funding required for each milestone
How much time this milestone will take to achieve (using real dates)
In this milestone we will outline the technical architecture of the solution in detail. This will include the work packages to adopt our technology as well as reaching out to and learning from other projects that are working in a similar direction - namely Glimmer and Opsientia. The result of this milestone will be a detailled specification and research report for the remaining solution.
In this milestone we will setup our storage on Filecoin that we were offered by Sealstorage. They need time to finalise their setup but at the end of this year we will be able to start using their storage. This will then include the connection to our backend as well as to our mobile and web app. We will then have a PoC that we are able to store the data coming into our applications to Filecoins storage. As the base for this PoC we will be using our image data union and their apps. This will be a functional version that can then be tested. It will then also be possible to consume the data from Filecoin in the DataUnion backend to train algorithms on the data. We will have a suit of functional tests that verify that all of the functionality works properly.
So the connections between Filecoin and our backend have to be made for these different apps:
Mobile app
Web app
Backend preview and metadata service
Backend AI consumption service
Backend data sales service
Time and funding:
Robin Lehmann - 20 hours (hourly rate 100$ => 2000$)
Akshay Patel - 90 hours (hourly rate 40$ => 3600$)
Okpo Ekpenyong - 90 hours (hourly rate 40$ => 3600$)
Zohaib Khan - 70 hours (hourly rate 40$ => 2800$)
Sarah Kay - 72.5 hours (hourly rate 40$ => 2900$)
Total cost: 14.900$
In the final miletsone of the grant we will add the data security and authorization layer on top of our solution of milestone two. As we can use the technology developed by Spruce this will accelerate this last step. But this step will include a lot of tests to make it 100% certain that the data cannot be stolen by anyone or used without authorization.
The budget requested across all milestones is 30.000 USD or the equivalent in FIL token. These funds will be used to pay for the development effort and operations required for this implementation, testing, and the further maintenance of this feature.
Maintenance and Upgrade Plans
This solution will then be made available to other data unions that are using our technology stack. It will be available as a feature in our backend and our contributor facing apps via our Github.
During the last year we have built multiple products as a team and started from nothing with no founding. Now we are going for our seed round in the Outlier Venture Filecoin base camp. So we have proven that we can conquer new challenges. Our experience with Filecoin technology is not the best yet but we are eager to learn and get into it. Additionally we are well connected with several other grant recipients that have been working in a similar direction with their products e.g. Glimmer (same cohort in the Outlier Venture Filecoin base camp) or Opscientia (friends from the OceanDAO ecosystem).
We reached out to a project that is specialised in letting users control access to their data, Spruce, and they agreed to collaborate with us to help us make this proposal happen.
Open Grant Proposal:
DataUnion stack Filecoin integration
Name of Project: DataUnion
Proposal Category:
app-dev
Proposer:
Robin Lehmann - @w1kke
Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT and APACHE2 licenses?: "Yes"
Project Description
DataUnion is building an ecosystem for data unions via a data-union-as-a-service approach. These data unions are going to create data that is machine learning ready and insights will be created on top of this data. Each data union represents a vertical for a data type or a specific use case e.g. restaurant data, agricultural data, sensor data or computer vision data. Currently we are storing all the data of these data unions in centralised servers. But we want to change that. The data producers/contributors have to be able to stay in control of their data but it has to stay available 24/7. So these data providers would have to rent their own servers or leave their phones/computers online all the time. This does not make sense as this would drive the price of their data up by a lot as their personal costs of providing access to the data would increase. So we are looking for a permanent storage solution that is decentralised, permanently available and cost efficient.
When we sell the data then we copy it to a machine where an algorithm training or insight application is then working with the data. To give people previews and work with the data we are providing a preview version from a hot storage where these previews are stored.
This is why we want to create personal drives for data contributors on Filecoin. The solution that we are building from this grant is going to create these wallets and integrate them into our platform stack that we are providing to the data unions. This means that the users upload their data, the previews are going to be created to work and showcase the data, and then the data is going to be stored on the users storage on Filecoin. It will be encrypted to make sure that the data cannot be stolen from Filecoin's storage system.
Value
The long term benefits are the amount of data to be stored will grow with each data union that we are going to establish. This is why it makes this a great chance to get right at this time when we are still in the earlier stages with our 12 initial data unions. We will then be able to store all the data decentralised, permanently and ready to be sold at any time. If we mess things up with the cryptographical part of the solution then we risk that the data could be leaked. This is one of the most important aspects that we are working out in this grant. The main risk is that the encryption and decryption part using the users wallet on the one side and then the ability to use the data that was stored for sales will not work properly.
Deliverables
The final deliverable for this project will be a version of our backend that works with the users data on a Filecoin storage but being able to still get a preview of the data and use it for data sales via authorization of the user - mostly algorithm training and insight generation. We will create a PoC where this is possible and show how this works.
Development Roadmap
Please break up your development work into a clear set of milestones. This section needs to be very detailed (will vary on the project, but aim for around 2 pages for this section).
For each milestone, please describe:
Milestone 1 - Research and Concept phase (Robin Lehmann (Product Owner), Akshay Patel (Researcher), Okpo Ekpenyong (Researcher), Spruce team, Filecoin team)
In this milestone we will outline the technical architecture of the solution in detail. This will include the work packages to adopt our technology as well as reaching out to and learning from other projects that are working in a similar direction - namely Glimmer and Opsientia. The result of this milestone will be a detailled specification and research report for the remaining solution.
Time and funding: Robin Lehmann - 60 hours (hourly rate 100$ => 6000$) Akshay Patel - 30 hours (hourly rate 40$ => 1200$) Okpo Ekpenyong - 30 hours (hourly rate 40$ => 1200$) Zohaib Khan - 20 hours (hourly rate 40$ => 800$) Total cost: 9.200$
Dates: 11/1/2021 - 12/15/2021
Milestone 2 - Filecoin setup & connection to DataUnion (Robin Lehmann (Product Owner), Akshay Patel (Developer), Okpo Ekpenyong (Developer), Sarah Key (Frontend Developer))
In this milestone we will setup our storage on Filecoin that we were offered by Sealstorage. They need time to finalise their setup but at the end of this year we will be able to start using their storage. This will then include the connection to our backend as well as to our mobile and web app. We will then have a PoC that we are able to store the data coming into our applications to Filecoins storage. As the base for this PoC we will be using our image data union and their apps. This will be a functional version that can then be tested. It will then also be possible to consume the data from Filecoin in the DataUnion backend to train algorithms on the data. We will have a suit of functional tests that verify that all of the functionality works properly. So the connections between Filecoin and our backend have to be made for these different apps:
Time and funding: Robin Lehmann - 20 hours (hourly rate 100$ => 2000$) Akshay Patel - 90 hours (hourly rate 40$ => 3600$) Okpo Ekpenyong - 90 hours (hourly rate 40$ => 3600$) Zohaib Khan - 70 hours (hourly rate 40$ => 2800$) Sarah Kay - 72.5 hours (hourly rate 40$ => 2900$) Total cost: 14.900$
Dates: 12/16/2021 - 2/15/2022
Milestone 3 - Encryption, Access Authorization (Robin Lehmann (Product Owner), Akshay Patel (Developer), Okpo Ekpenyong (Developer), Spruce team)
In the final miletsone of the grant we will add the data security and authorization layer on top of our solution of milestone two. As we can use the technology developed by Spruce this will accelerate this last step. But this step will include a lot of tests to make it 100% certain that the data cannot be stolen by anyone or used without authorization.
Time and funding: Robin Lehmann - 15 hours (hourly rate 100$ => 1500$) Akshay Patel - 40 hours (hourly rate 40$ => 1600$) Okpo Ekpenyong - 40 hours (hourly rate 40$ => 1600$) Zohaib Khan - 30 hours (hourly rate 40$ => 1200$) Total cost: 5900$
Dates: 2/16/2022 - 3/31/2022
Total Budget Requested
The budget requested across all milestones is 30.000 USD or the equivalent in FIL token. These funds will be used to pay for the development effort and operations required for this implementation, testing, and the further maintenance of this feature.
Maintenance and Upgrade Plans
This solution will then be made available to other data unions that are using our technology stack. It will be available as a feature in our backend and our contributor facing apps via our Github.
Team
Team Members
Team Member LinkedIn Profiles
Team Website
https://dataunion.app
Relevant Experience
During the last year we have built multiple products as a team and started from nothing with no founding. Now we are going for our seed round in the Outlier Venture Filecoin base camp. So we have proven that we can conquer new challenges. Our experience with Filecoin technology is not the best yet but we are eager to learn and get into it. Additionally we are well connected with several other grant recipients that have been working in a similar direction with their products e.g. Glimmer (same cohort in the Outlier Venture Filecoin base camp) or Opscientia (friends from the OceanDAO ecosystem).
Team code repositories
https://github.com/dataunion-app
Additional Information
We reached out to a project that is specialised in letting users control access to their data, Spruce, and they agreed to collaborate with us to help us make this proposal happen.