elixir-cloud-aai / crate-db

Microservice for handling RO-Crates
Apache License 2.0
0 stars 0 forks source link

ELIXIR Crate DB

License Python 3.11 Development Status GitHub contributors Ruff

Introduction

Various research artifacts are produced daily around the world, but managing them is a tedious task for researchers. Researchers prefer to spend more time on new discoveries rather than finding efficient ways to store, retrieve, or share their data. This project aims to alleviate these challenges by providing a robust backend system for managing RO-Crates—standardized packages for research data and metadata.

User Story

The end user (research publisher/scientist) will be able to manage any of the RO-Crates developed for personal or public research. The main functions include CRUD operations on RO-Crates and publishing them to Zenodo. This project can be summarized as an "Efficient RO-Crate Management System". Future enhancements include developing a frontend component to further simplify these operations.

Project Goals & Milestones

Goals

Implementation Details

Abstract

The project focuses on developing a backend system for managing RO-Crates via API endpoints. The system provides functionalities for storing, retrieving, publishing, and unpublishing RO-Crates.

Flow

Architecture

Three main components will be used:

  1. MinIO: Serves as an object storage system for storing zipped RO-Crates.
  2. MongoDB: Stores metadata and zipped RO-Crates.
  3. RO-Crate Microservice: Central package interacting with other components.

Why MinIO?

MinIO is chosen for its high performance, scalability, cloud-native compatibility, security features, and cost-effectiveness. It supports object storage, horizontal scaling, and built-in encryption, making it ideal for our project's needs.

Technical Details

Technologies

Challenges

  1. Complexity of RO-Crate Structure: Designing efficient storage and retrieval mechanisms.
  2. Integration with External Services: Handling authentication, authorization, and error handling.
  3. Optimizing Performance: Ensuring efficient uploads, downloads, and search operations.
  4. Security and Data Integrity: Protecting sensitive data and ensuring integrity during transmission and storage.

Future Prospects

Future development will focus on thorough implementation of CRUD operations, emphasizing simplicity and robustness. Stretch goals include advanced operations and developing a frontend component for enhanced user interaction and accessibility. This project aims to evolve into a comprehensive solution for managing RO-Crates efficiently and effectively.

Contributing

This project is a community effort and lives off your contributions, be it in the form of bug reports, feature requests, discussions, ideas, fixes, or other code changes. Please read these guidelines if you want to contribute. And please mind the code of conduct for all interactions with the community.

Versioning

The project adopts the semantic versioning scheme for versioning. Currently, the service is in alpha stage, so the API may change and even break without further notice.

License

This project is covered by the Apache License 2.0 also shipped with this repository.

Contact

Crate-DB is part of ELIXIR Cloud & AAI, a multinational effort at establishing and implementing FAIR data sharing and promoting reproducible data analyses and responsible data handling in the life sciences.

If you have suggestions for or find issue with this app, please use the issue tracker. If you would like to reach out to us for anything else, you can join our [Slack board][badge-url-chat], start a thread in our Q&A forum, or send us an email.

GA4GH logo ELIXIR logo ELIXIR Cloud & AAI logo