anvilproject / AnVIL-JIRA

0 stars 0 forks source link

How to use Gen3, Terra, Dockstore, Galaxy, Bioconductor... in AnVIL #802

Open kozbo opened 3 years ago

kozbo commented 3 years ago

Create a section for each environment that helps a user transition to using the tool/environment in AnVIL

┆Issue is synchronized with this Jira Story ┆Issue Type: Story ┆Sprint: Backlog ┆containerName: AnVIL ┆Issue Number: ANVIL-829

kozbo commented 3 years ago

Background on each of the environments can be found in the ECC report ( https://docs.google.com/document/d/12a25adxMR6UwJn0nFaquocmR12vAlMErQgzGJkgGjWk/edit?usp=sharing )

Gen3: Gen3 is a cloud-based software platform for managing, analyzing, harmonizing, and sharing large datasets. Gen3 is an open source platform for developing data commons. It accelerates and democratizes the process of scientific discovery, especially over large or complex datasets. The AnVIL instance of Gen3 ....

Terra: Terra (https://anvil.terra.bio/) is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. Workspaces are the building blocks of Terra - a dedicated space where you and your collaborators can access and organize the same data and tools and run analyses together (Figure 5). Each workspace comes with a Google Cloud bucket where data generated by a workflow analysis and notebook files are stored by default. Workspaces also provide data tables for storing and maintaining structured data similar to a spreadsheet. By including links to the data's actual location in the cloud, the data table can link data files to workspace tools. Finally, within a workspace, users can launch batch analysis jobs or one of several interactive computing environments, including Jupyter Notebooks, R/Bioconductor, or Galaxy. AnVIL users of Terra can ...

Dockstore: The Dockstore (https://dockstore.org) concept is simple; provide a place where users can share tools encapsulated in Docker and described with the Common Workflow Language (CWL), Workflow Description Language (WDL), or as Galaxy Workflows (GW) (Figure 6). This enables scientists, for example, to share analytical tools in a way that makes them machine readable and runnable in a variety of environments. While the Dockstore is focused on serving researchers in the biosciences, the combination of Docker + CWL/WDL/GW can be used by anyone to describe the tools and services in their Docker images in a standardized, machine-readable way. Dockstore users can leverage the AnVIL environment by...

Galaxy: Galaxy (http://usegalaxy.org) is an open, web-based platform for performing accessible, reproducible, and transparent genomic science. It includes features for executing scientific workflows, data integration, and data and analysis persistence. A major aim of Galaxy is to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it now has broad support for biomedical research of many forms. There are more than 5,000 analysis tools available within Galaxy including those for gene expression, genome assembly, proteomics, epigenomics, transcriptomics and a host of other disciplines in the life sciences.

Bioconductor: Bioconductor (https://bioconductor.org/) is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology. Computational and statistical methods are continuously developed to interpret biological data. Many of these methods are developed by members of the Bioconductor community, and the Bioconductor project serves as a software repository for a wide range of statistical tools developed in the R programming language. Using a rich array of statistical and graphical features in R, more than 1900 Bioconductor software packages, 3200 exemplary experiments, and 50000 model organism annotation resources have been curated for use in genomic data analysis. The use of these packages requires only an understanding of the R language. As a result, R / Bioconductor packages, which include state-of-the-art statistical inference tools tailored to problems arising in genomics, are widely used by biologists who benefit significantly from their ability to explore and analyze both public and privately developed datasets. Many R / Bioconductor applications can be presented to users in a way that does not require advanced programming expertise, e.g., as ‘shiny’ applications with graphical interfaces.