bihealth / rodeos

RODEOS: Raw Omics Data accEss and Organisation System
MIT License
0 stars 1 forks source link

development / support ongoing? #2

Closed cmeesters closed 2 years ago

cmeesters commented 2 years ago

Hi,

My attention was drawn to your project yesterday, and it seems really promising with regard to needs we have. However, we did not fail to notice, that there is one single commit 17 months ago and nothing since.

Is this an ongoing project? Is there any institutional commitment for long-term support?

Best regards Christian Meesters

holtgrewe commented 2 years ago

Dear Christian,

thank you for your interest in rodeos. Please note that this hosts mainly the documentation for the "big picture". I trust that you have already found the actual implementation of rodeos-ingest:

You will see that there the latest commit is more recent and there are even more recent updates to the Issues list.

We are currently working on integrating this with our system infrastructure. Most of our integration work focuses on the connected systems (sequencers, demultiplexing, HPC clusters with their compute and file systems, storage used in iRODS, the installation of irods and irods-capability-automated-ingest) and of course this is not reflected in our github repositories (but rather our internal Ansible playbooks for installing the systems).

We will present RODEOS on the iRODS User Group Meeting 2022 (Clemens who took over the work on RODEOS will present remotely however) and the recording will be put online on YouTube by the conference organizers.

I'll reproduce the contents of the abstract below.

It is our aim to develop the RODEOS system as high quality software for our use case first. As an academic group we believe that it is in the general interest to share our results as open source software on Github and we also believe in "public development first". We are planning to publish the RODEOS system once we have at least a second use case such as mass spectometry covered. At this stage we will also make the software installation more easy/reproducible, e.g., by providing a Docker compose file as we did for SODAR and VarFish (see below).

Right now there are a lot of "moving parts" and having such a "no assembly required" solution will make it easier for users to adjust it to get started and then integrate it with their system. Also, as this is an academic project that we do to solve our own problems, we can only offer help and support in a "best effort/community" way.

Does this answer your question? I'd be happy to answer more questions also directly by email (first.last name@bih-charite.de).

Cheers, Manuel

Omics data are generated by high-throughput biochemical assays that simultaneously quantify and/or characterize molecules of the same type in biological samples. In biomedical research, omics data acquisition is often performed in specialized technology units referred to as core facilities. Using complex (and often expensive) devices such as sequencers and mass-spectrometers, these units produce a wealth of different high-volume datasets that need to be organized, stored, quality checked, pre-processed or transformed and eventually delivered to clients, archived or deleted.

To streamline and automate the data management and handling processes while supporting the diversity of projects and clients present in the research organization, we introduce RODEOS (Raw Omics Data accEss and Organization System). The system is based on iRODS and rodeos-ingest, a custom event handler that extends the iRODS automated ingest framework. The automatic ingest enables an easier control of data through its life cycle from generation to delivery and deletion by unlocking iRODS' advantages like data discovery, connecting workflows based on the rule engine, as well as secure collaboration.

To enrich metadata beyond simple file attributes, rodeos-ingest extracts additional technology-specific parameters from files generated by the omics units' devices when processing samples. We provide examples for widely used Illumina sequencers and demonstrate how the extracted metadata could be used to support demultiplexing and data QC workflows. Furthermore we integrated Metalnx as an additional user interface to RODEOS. This allows the wet-lab staff to easily add further iRODS metadata, e.g. for choosing data delivery paths and also empowers clients to view their data and track progress. This reduces the complexity of operations for everyone involved, especially when used in cross-institutional settings if coupled to (possibly multiple) Active Directory services for user authentication.

RODEOS is in active use at the integrated sequencing unit of the Max Delbrück Center for Molecular Medicine (basic research) and the Berlin Institute of Health at Charité (university hospital). Additional rodeos-ingest modules are planned to support more facilities and technologies, e.g. mass spectronomy for metabolomics or proteomics.

Rodeos-ingest is MIT-licensed and available at https://github.com/bihealth/rodeos-ingest

cmeesters commented 2 years ago

Hoi Manuel,

I am afraid, I did not realize the 2nd rodeos page. Also, during the ISC it became apparent, that I will not be able to join you, resp. Clemens in Leuven at the iRODS-UGM. Yet, we are very keen to see the video and evaluate your software!

Thanks for the very detailed and friendly reply!

Cheers Christian

holtgrewe commented 2 years ago

Thanks for the kind words. I'm closing this ticket now. Feel free to send us an email or open another ticket in case of any question.