filecoin-project / devgrants

πŸ‘Ÿ Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
Other
371 stars 308 forks source link

Decentralized Storage for the Neuroscience Research Community #1676

Open charlalhoward opened 10 months ago

charlalhoward commented 10 months ago

Open Grant Proposal: Project Title

Project Name: NeuroStor

Proposal Category: Developer and data tooling

Individual or Entity Name: Spike Neuro LLC

Proposer: charlalhoward

(Optional) Filecoin ecosystem affiliations: None

(Optional) Technical Sponsor:

Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes.

Project Summary

The field of neuroscience research is experiencing a surge in data volume that will lead to groundbreaking healthcare discoveries. This data growth is primarily driven by new high channel count neural probes, providing access to unprecedented information about the brain and nervous system. However, this rapid data growth is straining researchers’ existing file storage resources. The limits of storage options for US based neuroscience researchers are also being exhausted by recent federal requirements to make all federally funded research data publicly available. As individual datasets start to consistently exceed 100s of GBs and push to TBs, researchers will need new options to meet their private data storage needs and public repository requirements.

The Filecoin ecosystem has excellent potential to provide a secure and cost-effective solution for data archive for these researchers. However, the existing tools are not ideal for sharing data either with collaborators or for public archive. Specifically, the current Motion project does not include meta data tagging or autorenewal features, which would be expected by this target market. Additionally, a public archive will require user access controls such that the data access meets federal requirements while preventing excessive downloading or abuse of the system.

Spike Neuro is a neuroscience development company well attuned to the needs of neuroscience researchers and the growing problems that data storage is imposing on the community. We have an excellent team of software engineers with expertise in database management and neuroscientists who understand the nuances of our target market. In this project our group will build custom APIs to integrate with Motion and the Filecoin ecosystem to enable a neuroscience focused data archive for public and private data storage. This development work will specifically focus on developing a system for autorenewal of storage contracts, metadata tagging with industry standardization, and user access controls to support data sharing. The success of this project will provide neuroscience researchers with a cost-effective solution for their data storage needs while bringing multiple petabytes of archive research data onto the Filecoin platform.

Impact

- What pain points does this project seek to address?

The growth of neuroscience data will soon become a limiting factor for researchers. For examplethe Allen Institute, a nonprofit bioscience research institute leading the development of high channel count neural recording, recently released the largest neuroscience dataset of its kind showing activity of around 300,000 neurons from 81 mice recorded using 6 Neuropixel high channel count probes. This dataset is highly valuable in the wealth of information it provides about the nervous system; however, the researchers have faced difficulties with sharing the data due to its size. Other labs using Neuropixel probes have reported having data kicked off their university servers, making it incredibly difficult to work with their large datasets. This is a current and growing problem neuroscience researchers are facing. Data such as this is too valuable to be used by only a small group of researchers, but too large to be shared easily.

The storage capabilities Filecoin provides are an excellent solution to this problem researchers are facing, providing both strong security and low cost tailored to large data. However, these researchers are not data scientists, and they are looking for data storage and sharing solutions with accessibility more in line with traditional storage providers, such as Amazon or Dropbox. There is a wealth of archive research data ideal for the Filecoin network but UX/UI development along with underlying access management and autorenewal features are necessary to attract researchers to this less familiar platform.

- What are the benefits to getting this right? What are the risks of not getting this right?

Research data in general should be seen as a low hanging fruit for providing a consistent stream of data onboarding for the Filecoin network. Thus, creating a user friendly and familiar means to upload this data is key to accessing this market. Neuroscience data is a particularly apt target as the global field is in need of new storage options, the US market (driven by NIH funding and regulations) has new demands on data sharing they are struggling to meet, and the field can provide a gateway to the broader market of clinical/healthcare data. These problems need to be addressed and market demand will drive the eventual development of a neuroscience focused data storage platform. Filecoin is ahead of the game on the storage side and being first on the user access side will create an excellent market advantage. Failing to make the Filecoin network more accessible to the average user will result in researchers continuing with the status quo utilizing non-specialized common data services such as Dropbox and Amazon, despite their shortcomings. Spike Neuro seeks to work with the Filecoin ecosystem to build tools that provide a specific solution to these data storage problems while considering ease of use and accessibility to support the continued growth of neuroscience research.

- What impact will this project have in a specific vertical, market, or ecosystem? What does success look like?

Success of this project will not only enable the onboarding of neuroscience research data but data from many other areas such as bioscience and healthcare. Indeed, the US federal public data sharing requirements are not exclusive to neuroscience but all federally funded research. Successful development of tools to support adoption by neuroscience researchers can be adapted and marketed to other research fields facing similar data storage challenges. While we are focusing on neuroscience research due to Spike Neuro’s expertise and the immediate need of this specific market, the development from this project will have impact far beyond a single field.

Outcomes

At the conclusion of this project, we will have a UX/UI tailored to academic neuroscience researchers. Our platform will provide our target users with a user experience that is familiar and accessible while offering specific features desired for neuroscience research, including a public archive. The user interface will include a landing page, sign-on page, subscription page allowing users to subscribe to different usage plans with autorenewal options, and a user dashboard enabling users to utilize the service. The platform will include user management features including single sign-on integration and implementation of role-based access control to facilitate different user roles and permissions. A tiered user role structure is crucial for supporting an academic lab environment. Filecoin integration will support file transmission and data handling. Our E-commerce features will provide a payment gateway and support plan management for various subscriptions. The backend services will have open-source APIs for frontend-backend interaction and potentially support additional third-party integrations. We will employ a NoSQL database for session management and other data handling. We will use a microservice architecture to facilitate scalability and maintainability.

Specific Deliverables include:

β€’ Codebase: A fully documented codebase for the application and related services. β€’ Infrastructure Scripts: Terraform scripts for infrastructure setup and deployment. β€’ Documentation: Comprehensive documentation covering setup, deployment, user guides, and developer guides.

Adoption, Reach, and Growth Strategies

Our initial target market for this project is academic and private research institutions who conduct electrophysiology neuroscience research. There are an estimated 20,000 neuroscience electrophysiology labs in the US with an additional 50,000 globally. We are currently engaged with this market through the development and sale of our existing electrophysiology product line, including high channel count neural probes. We have also lined up multiple key opinion leaders in this field as beta testers, who will advocate for broader use in this community if successful. We are experienced in marketing to this group and boast a large lead list for future marketing campaigns. We also have relationships with multiple international distributors of electrophysiology research equipment to support our reach into the global market.

We have a sales team of experienced neuroscientists in place for our current product line to support initial on-boarding of early customers. As we grow, we intend to build a dedicated sales and support team for this product. We also intend to pursue National Institutes of Health (NIH) funding to support later stage platform development. Funding through the NIH would position us as a key repository for NIH funded research data. We have received positive feedback from the NIH on this project; however, funding from this proposal will provide proof of concept for the platform supporting the receipt of greater resources.

Development Roadmap

Milestone 1: Exploration and Requirement Gathering This milestone will establish the final design plan for the development work. Specifically we will confirm alignment between the planned architecture of our platform with existing Filecoin tools to effectively form the development plan.

● Engineering Team: Solutions Architect, Business Analyst, Technical Lead ● Budget: $7,000 ● Dates: 12/1/2023 – 12/31/2023 (3-5 weeks)

Key Activities/Deliverables: ● Define user roles, permissions and requirements for RBAC. ● Formulate detailed requirements for Filecoin integration. ● Identify potential risks and formulate mitigation strategies ● Secure fundamental compliance and security requirement details ● Define software architecture.

Milestone 2: Infrastructure Setup and Base Development (web3) This key development milestone will set up the multiple microservices that will support the neuroscience data storage platform including user access, role management, and contract autorenewal, and data standardization. The integration of the microservices will be considered within the design of the UI/UX to create a familiar and accessible user experience.

● Engineering Team: DevOps Engineer, Back-end Developer (Focused development expertise including Web3.js, IPFS, and Firebase along with Python, Django/Flask and Kubernetes expertise), Front-end Developer, QA Engineer, and UI/UX Designer ● Budget Estimate: $27,000 ● Dates: 1/1/2024 – 2/15/204 (5-6 weeks)

Key Activities/Deliverables: ● Create base Kubernetes deployment using an open-source stack ● Develop initial microservices (authentication, user management, and subscription, using Django/Flask along with web3 tools to support smart contracts for roles and permissions) ● Set up a local development environment with the necessary tooling and documentation. ● Setup CI/CD pipeline and integrate with the development workflow ● Initial design of the User Interface following responsive guidelines ● Design initial UI/UX and user flow. ● Secure and setup database with initial schema (Postgres or MySQL) ● Utilize a NoSQL database like MongoDB for session management.

Milestone 3: Infrastructure as Code (IaC) Setup We will develop and confirm the functionality of the infrastructure of the platform with IaC to provide consistent automation of key processes and configurations. This will provide a uniform user experience across the microservices of our platform.

● Engineering Team: DevOps Engineer, Cloud Engineer, Security Engineer ● Budget: $10,000 ● Dates: 2/16/2023 – 3/5/2024 (3 weeks)

Key Activities/Deliverables: ● Develop and apply Terraform scripts for AWS infrastructure. ● Ensure security and compliance using AWS and Terraform best practices. ● Validate infrastructure setup, deploying initial applications and services.

Milestone 4: Core Development and Filecoin Integration In the milestone we will integrate our microservices and UX/UI with Filecoin and confirm functionality of connectors to enable data storage and retrieval. If needed, we will revisit development efforts from Milestone 2 to address any integration issues. Completion of this milestone will provide proof-of-concept for the proposed platform and allow us to initiate beta testing and deployment.

● Engineering Team: Full-stack Developer, Integration Engineer, and QA Engineer ● Budget Estimate: $20,000 ● Dates: 3/6/2024 – 4/19/2024 (5-6 weeks)

Key Activities/Deliverables: ● Develop e-commerce functionality with placeholder for payment gateway ● Establish Filecoin connectors and validate with a minimal viable product (data storage and retrieval) ● Implement core functionalities and integrate with Filecoin, providing automated storage deal renewals, data transmission and storage. ● Implement role-based access control (RBAC) ● Execute bi-weekly sprints to incrementally develop and test functionality ● Conduct functional testing

Milestone 5: SSO Integration, Testing, and Deployment Preparation This milestone will leverage feedback from internal and external testers to confirm functionality of the platform and that design requirements established in milestone 1 have been met.

● Engineering Team: Full-stack Developers, Security Engineer, and QA Engineers ● Budget Estimate: $18,000 ● Dates: 4/20/2024 – 5/24/2024 (5 weeks)

Key Activities/Deliverables: ● Integrate SSO with popular providers (Google, LinkedIn) and test authentication flow ● Perform thorough testing - including functional, security, and performance testing ● Prepare for deployment ensuring all microservices are stable and secure ● Finalize UI/UX, ensuring compatibility with Chrome, Safari, and Firefox on both mobile and desktop

Milestone 6: Launch, Monitor, and Optimize In this final milestone we will launch the platform with a small number of initial users to evaluate function at scale and expand our collection of user feedback for continued development. Beyond this milestone we will pursue NIH funding to continue to add new features for the neuroscience community as well as explore opportunities to present the platform to additional research fields.

● Engineering Team: DevOps, Full-stack Developers, and Support Engineers ● Budget Estimate: $10,000 ● Dates: 5/25/2024 – 6/16/2024 (2-3 weeks)

Key Activities/Deliverables: ● Deploy the solution to the production environment ● Monitor system performance, user interactions, and troubleshoot issues ● Collect user feedback and identify areas for improvement ● Optimize system performance and user experience based on real-time data ● Execute extensive testing: functional, security, and performance. ● Rectify discovered issues and optimize the solution.

Total Budget Requested

$92,000

Maintenance and Upgrade Plans

Long-term we plan to continue to add new platform features tailored to the neuroscience research community. Building our platform with a microservice architecture will allow us to add new tools and services without a disruption to the existing framework. Specifically, we plan to add features to support data analysis and file standardization. We will work with Filecoin or explore other storage options for these features requiring β€œhot” data access. We also intend to add additional security features that meet HIPAA requirements to enable the storage of sensitive clinical and health information. We will also expand our platform to meet the needs of additional research fields, first focusing on other areas needing tools for the new NIH data sharing requirements.

Team

Team Members

Team Member LinkedIn Profiles

Team Website

https://spikeneuro.com/

Relevant Experience

Our team brings together software experts in database management, security, and cloud architecture along with experienced neuroscientists who understand the specific needs of our target market.

Oscar Mora is a DevSecOps expert and software team lead. He has led and executed impactful projects to drive organizational excellence over 17 years of dedicated service at globally acclaimed corporations. He brings specific expertise in cybersecurity, DevSecOps, SRE, and modern technologies like Kubernetes, vulnerability remediation and controls/compliance. He also bring profound knowledge of major cloud providers (AWS, GCP, Azure) and on-premise solutions, excelling in team leadership and Agile methodologies.

Scotty Rodriguez is our Senior DBA/Backend Engineer. He has over 17 years of experience in software development/data/SRE and operations. He is skillful in DevOps/Cloud technologies and tools such as Amazon Web Services, Infrastructure as Code (Terraform) and CI/CD pipelines (Team City, Jenkins, Octopus Deploy). He specializes in problem resolution and leading operational improvements through agile methodologies (Scrum, Kanban). Has worked on Software Development teams focused on providing high quality software with modern development practices such as TDD (Test Driven Development) with IoC (Inversion of control). Additionally, has worked on Data teams and has extensive knowledge in Relational Databases (SQL) and Non Relational Databases (MongoDB, Amazon DynamoDB).

Daniel Villanea is our Lead DevOps Engineer. Having worked over 15 years in IT, Daniel has engineered robust solutions for Fortune 500 company projects, leading complex Kubernetes deployments and API ecosystems with expertise in technologies like Python, Docker, Terraform, Helm and AWS Cloud Services. He is a certified AWS Solutions Architect and SAFe practitioner. Past roles have included Principal DevOps Engineer and lead SRE, providing a solid foundation ensuring projects not only innovate but also adhere to stringent operational and compliance standards.

Jeison Altamirano is our Senior Network & Cloud Engineer with a proven track record in both arenas. His toolkit is filled with the most powerful network tools like Cisco, Juniper, and Arista, ensuring our infrastructure is fast, secure, and scalable. In the cloud, he has substantial experience with AWS, Azure, GCP, Docker, Kubernetes, Terraform, and Ansible to craft seamless and efficient cloud ecosystems.

Jose Reyes is a Senior Cloud Architect. His skill set includes DevOps, Kubernetes, AWS, Azure, Terraform, Kafka, Puppet, Ansible and robust monitoring solutions. Jose is a seasoned professional in designing and optimizing cloud infrastructure for seamless operations.

Dr. Charla Howard, PhD, is the overall project lead on this endeavor. She brings excellent experience in neuroscience research and business development and will support coordination efforts between the software team and our neuroscience team members and beta testers. She is experienced in managing large and diverse projects bringing together experts from multiple fields to develop useful technology. As Chief Clinical Officer at Spike Neuro she also has a strong understanding of the long-term objectives of this work and compliance requirements that will ultimately need to be met for human clinical and medical data.

Dr. Rebecca Gerth, PhD, is our team neuroscience data expert. She has years of experience as a researcher working with large electrophysiology datasets. Having spent multiple years in neuroscience research equipment sales, she is also familiar with the diverse needs of other researchers in the field. She will lead efforts in defining criteria and testing.

Additional Information

How did you learn about the Open Grants Program? Clara Tsao Please provide the best email address for discussing the grant agreement and general next steps. charla@spikeneuro.com Please include any additional information that you think would be useful in helping us to evaluate your proposal. Spike Neuro is a neuroscience research tool development company. We have multiple β€œdata producing” products and are well versed in the data needs of the neuroscience community. This data archive service would compliment our existing product line and we have an active customer base to for marketing.

charlalhoward commented 10 months ago

Additional team member linkedin profiles

ErinOCon commented 9 months ago

Hi @charlalhoward, thank you for your proposal. Unfortunately, we will not be moving forward with a grant at this time. Wishing you all the best as you continue to build!

If you have any questions for our team, we can be reached at grants@fil.org.