Bioconductor / BioC2018

BioC 2018: Where Software and Biology Connect
5 stars 10 forks source link

SIG: Statistical Analysis and Comprehension of the Human Cell Atlas in R/Bioconductor #5

Closed stephaniehicks closed 3 years ago

stephaniehicks commented 6 years ago

Introduction of yourself: Bioconductor developers involved in the Chan Zuckerberg Initiative (CZI) to develop collaborative computational tools for the Human Cell Atlas (HCA).

Should it be held during Developer Day? Preferably, yes.

Desired outputs:

  1. Want to provide an update to the BioC community on our ongoing project with the CZI-HCA
  2. Want to have dedicated time for BioC developers involved in this project to meet at the Bioconductor 2018 conference to discuss progress and work on packages
  3. Want to get feedback from larger BioC community on the project and discuss ideas for long(er)-term funding for this work

Description of the topic: International projects generating large amounts of single-cell data, such as the Human Cell Atlas (HCA), have led to a great demand from researchers for fast, scalable, and efficient infrastructure and tools to analyze and to effectively extract knowledge from billions of single cells. This led to a call for applications for funding from the Chan-Zuckerberg Initiative (CZI) to develop collaborative computational tools to access, analyze and understand data from the HCA. The Bioconductor community submitted a joint proposal in August 2017 titled the Statistical Analysis and Comprehension of the Human Cell Atlas in R/Bioconductor and we were recently awarded funding for one year to (1) provide a coherent programmatic interface to the HCA, and (2) enable scalable interactive statistical analysis of large single-cell data. This birds-of-a-feather session is to provide a summary of what was done in the past year and what we plan to do in the next year. Our project aims are:

  1. Enable HCA Data Coordination Platform (DCP) access through R / Bioconductor.
  2. Develop standard representations of large single-cell data in semantically rich R / Bioconductor objects using established design principles.
  3. Develop scalable data preprocessing and normalization pipelines that account for systematic bias and unwanted variability.
  4. Implement fast and efficient algorithms scalable to billions of cells.
  5. Facilitate finding and working with HCA data through ontology bindings.

A description the principal investigators and their role in the project is provided here:

Finally, in the birds-of-a-feather session we will discuss and highlight existing and proposed Bioconductor software aimed at the analysis of single-cell data to accomplish the aims of this project. For example, we have developed a unified representation for single-cell data with the SingleCellExperiment S4 class, which is an extension of the popular SummarizedExperiment class. In the past year, this class has been widely incorporated into many popular Bioconductor single-cell packages (e.g. scater, MAST, scDD, scPipe, scran, splatter, zinbwave, DropletUtils, clusterExperiment, SC3, destiny, and BASiCS) enabling improved interoperability between packages. To make tools and analyses scalable to millions of cells, we have proposed Bioconductor infrastructure and efficient data representations for large single-cell data with millions or billions of cells. This infrastructure is primarily based on out-of-memory computations with Bioconductor packages such a HDF5Array (implements HDF5-based on-disk representation), DelayedArray (implements lazy manipulation for efficient interactive analyses), rhdf5client (facilitates use of HDF Server or HDF Cloud for remote array data), and BioCParallel (standardizes parallel processing throughout the Bioconductor ecosystem).

miaozhun commented 6 years ago

Hi, Stephanie, thank you for the excellent proposal. I'm Zhun Miao from Tsinghua University of China, and I'm very interested to participate the group. Thank you! See you then!

stephaniehicks commented 3 years ago

Closing this issue.