PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
110 stars 20 forks source link
genetic-risk-score nextflow pgs pgs-catalog polygenic-risk-score polygenic-risk-scores polygenic-score polygenic-scores prs workflow

The Polygenic Score Catalog Calculator (pgsc_calc)

Documentation Status pgscatalog/pgsc_calc CI DOI

Nextflow run with docker run with singularity run with conda

Introduction

pgsc_calc is a bioinformatics best-practice analysis pipeline for calculating polygenic [risk] scores on samples with imputed genotypes using existing scoring files from the Polygenic Score (PGS) Catalog and/or user-defined PGS/PRS.

Pipeline summary

[!IMPORTANT]

The workflow performs the following steps:

And optionally:

See documentation for a list of planned features under development.

PGS applications and libraries

pgsc_calc uses applications and libraries internally developed at the PGS Catalog, which can do helpful things like:

If you want to write Python code to work with PGS, check out the pygscatalog repository to learn more.

If you want a simpler way of working with PGS, ignore this section and continue below to learn more about pgsc_calc.

Quick start

  1. Install Nextflow (>=23.10.0)

  2. Install Docker or Singularity (v3.8.3 minimum) (please only use Conda as a last resort)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run pgscatalog/pgsc_calc -profile test,<docker/singularity/conda>
  4. Start running your own analysis!

    nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --pgs_id PGS001229

See getting started for more details.

Documentation

Full documentation is available on Read the Docs

Credits

pgscatalog/pgsc_calc is developed as part of the PGS Catalog project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert) and the European Bioinformatics Institute (Helen Parkinson, Laura Harris).

The pipeline seeks to provide a standardized workflow for PGS calculation and ancestry inference implemented in nextflow derived from an existing set of tools/scripts developed by Inouye lab (Rodrigo Canovas, Scott Ritchie, Jingqin Wu) and PGS Catalog teams (Samuel Lambert, Laurent Gil).

The adaptation of the codebase, nextflow implementation, and PGS Catalog features are written by Benjamin Wingfield, Samuel Lambert, Laurent Gil with additional input from Aoife McMahon (EBI). Development of new features, testing, and code review is ongoing including Inouye lab members (Rodrigo Canovas, Scott Ritchie) and others. If you use the tool we ask you to cite our paper describing software and updated PGS Catalog resource:

This pipeline is distrubuted under an Apache License amd uses code and infrastructure developed and maintained by the nf-core community (Ewels et al. Nature Biotech (2020) doi:10.1038/s41587-020-0439-x), reused here under the MIT license.

Additional references of open-source tools and data used in this pipeline are described in CITATIONS.md.

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.