Bioconductor / BioC2020

BioC2020: Where Software and Biology Connect
http://bioc2020.bioconductor.org/
12 stars 11 forks source link

BOF - Reproducible environments for integrated computational workflows #106

Open kevinrue opened 4 years ago

kevinrue commented 4 years ago

Hello,

I'm happy to propose a BOF on the following topic

Reproducible environments for integrated computational workflows

Input/topic: A full biological analysis workflow often requires numerous software tools deployed at individual analysis steps, some of which may have conflicting software version requirements or are written in different programming languages. In the spirit of an open discussion, we would like to gather experiences and suggestions on current solutions and best practices around the use of software environments (e.g., Conda, renv, Docker, Singularity) in combination with workflow managers (e.g., Snakemake, Nextflow, cgat-core), with a specific focus on workflows that integrate tasks involving multiple programming languages in addition to R (e.g. Python, Java, Shell).

For example, challenges and considerations faced when designing and using software environments for multi-lingual pipelines on institutional high-performance computing clusters (HPCs) include:

Specifically, we are keen to discuss the pros and cons of individual software environment frameworks, in relation to the context in which they are intended to be used. For instance, the motivation and design choices behind each software environment framework influences their respective capacity to support individual programming languages. Reciprocally, individual workflow managers strive to support multiple software environments frameworks, giving users a range of choices that may lead to a paradox of choice and confusion about best practices in their respective computing environment(s).

Ideally, this could develop into a community-driven review of existing frameworks for both software management and workflow management, driven by individual experiences and combined expectations from a broad range of users. In particular, this effort could complement the recent preprint Streamlining Data-Intensive Biology With Workflow Systems - there the focus was on a broader set of best practices for the design of streamlined computational workflows.

Output: While the conversation will be kept in a very open format to enable the participation for attendees coming from diverse backgrounds and academic levels, we would like to document and structure the output of this BoF as a collaborative manuscript, reviewing existing frameworks and best practices in designing and managing reproducible software environments for use in computational workflows for scientific research.

Kevin (@kevinrue), Charlotte (@Charlie-George)

kevinrue commented 4 years ago

Some related reading: https://doi.org/10.1038/s41592-020-0886-9

csoneson commented 4 years ago

👋 @kevinrue - your BioC2020 BoF session has been scheduled for Fri, July 31, 12-12:55pm EDT. A Zoom meeting has been created and will be accessible from within the conference platform (please make sure that you are registered for the conference in order to get access to the platform). There will also be a person from the organization team present during your session to help in case of any technical issues. Don't hesitate to let us know if you have questions or need additional support for your BoF. Thanks for contributing to the conference!