Sage-Bionetworks / rare-disease-workflows

Workflows for processing rare disease data on Synapse.
5 stars 4 forks source link

Rare Disease Workflows

This repo contains a list of workflows designed for accerating the interpretation of rare-disease data, currently applied to data within the NF-OSI.

The goal of this repository is to faciliate harmonization across rare-disease datasets that are deposited onto Synapse. These workflows will be centered around various stages of analysis, shown below.

Alt text

Data Harmonization workflows

We have built a number of workflows that re-process raw data uploaded to Synapse to create a single data repository. All harmonized datasets will be colated in a single table that should be uploaded every time we have a new dataset processed.

Data Type Description Location Destinations Status
RNA-Seq This workflow runs Salmon alignment from FASTQ files to populate a both a public and private Synapse table that stores all NF-related gene expression data. rna-seq-workflow/ JHU Biobank RNA-Seq Complete
Exome/WGS-Seq This workflow currently runs vcf2maf on uploaded vcf files and stores them on Synapse. gene-variant-workflow JHU Biobank Exome Seq Currently processed files, needs to be updated to store data on synapse. Currently being uploaded to run DeepVariant caller
Somatic variant caller This workflow should take the raw data and call somatic variants. somatic-variant-caller/ TBD Currently adding Synapse pieces to Kids-First variant caller
Drug-Sensitivity Data This workflow takes drug-sensitivity data and combines it to a single file. drug-screening-workflow Table TBD Still requires table update

As more data types are added we will continue to add more workflows.

Research projects/workflows

Building upon these standardized datasets are workflows that take the data generated by the data harmonization steps above to identify new scientific hypotheses.

Project name Description Data Type Results Tables Status
DTEN Drug-target expression network takes gene expression data and identifies relevant drug targets. RNA-Seq DTEN Nodes, [DTEN Terms]() Currently required update to do analysis of network results.
CDOM Combinatorial domain mutation analysis Exome/WGS-Seq TBD Under development
[Latent variable analysis]() We are comparing various latent variable approaches to characterize gene expression data RNA-Seq Under development
[Gene mutation feature selection]() This project evaluates the ability of specific genes to predict tumoe type Under development
[Drug sensitivity analysis]() This project looks to identify difference in drug sensitivy across cell lines representing different tumor types Under development

Results extraction workflows

This is a placeholder for tool we develop to transfer the data into a more visualizable format, such as shiny apps or ETL tools.