Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: "Yes"
Project Description
LabDAO’s mission is to become an open, community-run network of wet & dry laboratories accelerating progress in the life sciences. As part of its mission to develop resources for scientists to come together and work online, LabDAO is interested in adopting decentralized scientific compute infrastructure for its scientists' work. Bacalhau’s mission is to enable open reproducible data pipelines to advance science and reproducibility.
This grant is intended to create resources for bacalhau usage within LabDAO, explore bacalhau’s utility for scientific compute and build an integration with the lab-exchange for computational services.
For example, an expected result of this collaboration would be to enable an uninitiated computational biologist to perform bioinformatic pipelines at the nf-core standard from a standard laboratory laptop with no need for environment management and package handling on a local machine.
Value
This integration will serve three primary purposes:
To enable LabDAO scientists to have a low cost, easily accessible, and transparent environment to quickly execute bioinformatics workloads related to their research.
To spread the word within the scientific community about ways to use bacalhau for scientific compute
To add to the useful data workloads on Filecoin and IPFS via data pipelines on Bacalhau.
Storage Utilization
Equibind - in-silico drug discovery: 200MB for a receptor-ligand interaction 3D structure. Modeling receptor-ligand interactions for 100 compounds to predict cancer cell line drug sensitivity would result in 200MB/drug*protein x 10 drugs (most reproducible compounds across NCI60, CTRP and GDSC) x 1000 high variance proteins = 2TB of protein and ligand structure model outputs
Satellite computer vision: 1GB per raw satellite image from planet, one global base layer is ca. 130GB
Plant and Fungal genome assembly: highly variable sequencing needs ranging from compact genomes (40MB for Amanita Muscaria) to larger genomes of up to 10-100 x 10E9 base pairs.
Ocean metagenomics: 10GB average metagenome x 1 sample x 10 timepoints = 100GB of collected information
Deliverables
Phase 1: two scientific workload examples
Phase 2: modified scientific workload to leverage data from IPFS
Phase 3: integration with Nextflow and IPFS based IO, including development of an IPFS submodule for nextflow
Phase 4: private bacalhau cluster setup guide for community members
Phase 5: Integration of bacalhau into the openlab clients (CLI and web-app) for bio-compute
Marketing and Awareness Building: blog articles, cover art, social media posts
Development Roadmap
Phase 1: Perform small scale scientific compute in container from ghcr with file mount (no IPFS IO needed)
run the workflow in container with a test dataset that is within the container
run with local data, outside of the container, through the bacalhau CLI using the –local flag
Example workloads - at least two of these will be selected:
Equibind - in-silico drug discovery laboratory (internal), [GPU optional]
Phase 3: running bio-compute containers using nextflow and IPFS based IO, including development of an IPFS submodule for nextflow
Anticipated steps to implement:
Will require design effort to modify the docker runtime integration for Bac (potentially consideration of DAG execution). Nextflow docker container documentation
Phase 4: set-up of 1-3 bacalhau clusters with LabDAO controlled hardware - publishing of a guide for community members and academic centers to do the same
Consider setting up a DNS record with multiple A Records to point to IP addresses of the cluster. Then you can point the BAC CLI to point to the DNS
Production nodes use libp2p with gossip sub for communication, allows for dynamic peering of new servers
Phase 5: integration of bacalhau into the openlab clients (CLI and web-app) for bio-compute
Anticipated steps to implement:
Test integration of the LabDAO bacalhau clusters with example pipelines
Integrate bacalhau as a (default) runtime for job execution on the provider side of the client
Notes:
Focus on ease of use for scientists
Payments integration for both Bacalhau and LabDAO projects is anticipated to be 6+ months into the future and will be out of scope for this exercise.
Marketing and Awareness Building:
Editing of blog articles
Cover art design
Posting to social media
Total Budget Requested
Team size - 3 full team members
1 Machine Learning and HPC Engineer (Stanley Bishop)
1 Network Engineer and lab-exchange client maintainer (Richard Smith Unna)
1 Machine Learning Programmer (Kelvin Wallace)
Total Requested Budget:
(4.5 months x 3 full time contributors x 6000 USD/month) = $75,000 USD
Maintenance and Upgrade Plans
Maintenance and Upgrade has not been included in the scope of this proposal. Please let us know if you would like for us to plan for specific maintenance and/or upgrade scenarios.
LabDAO is a decentralized research organization connecting researchers and scientists, both amateur and professional into a community of open collaboration. Our core objective is to develop the infrastructure that enables scientists to work from anywhere, and collaborate with with anyone.
The team has a core role within LabDAO and previously developed and maintained containerised scientific applications in industry and academia (Stanley and Kelvin - chemoinformatics, Rik - plant genomics, Niklas - microscopy computer vision).
Open Grant Proposal:
LabDAO + Bacalhau Core Integration
Name of Project: LabDAO + Bacalhau Core Integration
Full grant proposal writeup available here
Proposal Category: app-dev Proposer: @wesfloyd
Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: "Yes"
Project Description
LabDAO’s mission is to become an open, community-run network of wet & dry laboratories accelerating progress in the life sciences. As part of its mission to develop resources for scientists to come together and work online, LabDAO is interested in adopting decentralized scientific compute infrastructure for its scientists' work. Bacalhau’s mission is to enable open reproducible data pipelines to advance science and reproducibility.
This grant is intended to create resources for bacalhau usage within LabDAO, explore bacalhau’s utility for scientific compute and build an integration with the lab-exchange for computational services.
For example, an expected result of this collaboration would be to enable an uninitiated computational biologist to perform bioinformatic pipelines at the nf-core standard from a standard laboratory laptop with no need for environment management and package handling on a local machine.
Value
This integration will serve three primary purposes:
Storage Utilization
Deliverables
Phase 1: two scientific workload examples Phase 2: modified scientific workload to leverage data from IPFS Phase 3: integration with Nextflow and IPFS based IO, including development of an IPFS submodule for nextflow Phase 4: private bacalhau cluster setup guide for community members Phase 5: Integration of bacalhau into the openlab clients (CLI and web-app) for bio-compute Marketing and Awareness Building: blog articles, cover art, social media posts
Development Roadmap
Phase 1: Perform small scale scientific compute in container from ghcr with file mount (no IPFS IO needed)
Example workloads - at least two of these will be selected:
Anticipated steps to implement:
Phase 2: running bio-compute containers with IO through IPFS
Anticipated steps to implement:
Phase 3: running bio-compute containers using nextflow and IPFS based IO, including development of an IPFS submodule for nextflow
Anticipated steps to implement:
Phase 4: set-up of 1-3 bacalhau clusters with LabDAO controlled hardware - publishing of a guide for community members and academic centers to do the same
Notes:
Phase 5: integration of bacalhau into the openlab clients (CLI and web-app) for bio-compute
Anticipated steps to implement:
Notes: Focus on ease of use for scientists Payments integration for both Bacalhau and LabDAO projects is anticipated to be 6+ months into the future and will be out of scope for this exercise.
Marketing and Awareness Building:
Total Budget Requested
Team size - 3 full team members 1 Machine Learning and HPC Engineer (Stanley Bishop) 1 Network Engineer and lab-exchange client maintainer (Richard Smith Unna) 1 Machine Learning Programmer (Kelvin Wallace)
Estimates Effort by Phase: Phase 1: 2 weeks Phase 2: 2 weeks Phase 3: 4 weeks Phase 4: 4 weeks Phase 5: 4 weeks Marketing and Awareness Building: 2 weeks Total: 4.5 months
Monthly stipend per contributor: 6000 USD
Total Requested Budget: (4.5 months x 3 full time contributors x 6000 USD/month) = $75,000 USD
Maintenance and Upgrade Plans
Maintenance and Upgrade has not been included in the scope of this proposal. Please let us know if you would like for us to plan for specific maintenance and/or upgrade scenarios.
Team
Team Members
Stanley Bishop (lead - data science lab) Kelvin Wallace Rik Smith Unna Niklas Rindtorff (correspondence)
Team Website
https://www.labdao.xyz/
Relevant Experience
LabDAO is a decentralized research organization connecting researchers and scientists, both amateur and professional into a community of open collaboration. Our core objective is to develop the infrastructure that enables scientists to work from anywhere, and collaborate with with anyone.
The team has a core role within LabDAO and previously developed and maintained containerised scientific applications in industry and academia (Stanley and Kelvin - chemoinformatics, Rik - plant genomics, Niklas - microscopy computer vision).
Team code repositories
https://github.com/labdao https://github.com/openlab-apps/lab-equibind https://github.com/NewAtlantis https://github.com/filecoin-project/bacalhau
Additional Information
Contact: wesfloyd@protocol.ai, niklas@labdao.com Google Doc Version here