DyogenIBENS / FINSURF

FINSURF is a tool designed to analyse lists of sequences variants in the human genome.
https://www.finsurf.bio.ens.psl.eu/
Other
11 stars 2 forks source link
cecill score sequences-variants

alt text

Introduction

FINSURF (Functional Identification of Non-coding Sequences Using Random Forests) is a tool designed to analyse lists of sequences variants in the human genome.

It assigns a score to each variant, reflecting its functional importance and therefore its likelihood to disrupt the physiology of its carrier. FINSURF scores Single Nucleotide Variants (SNV), insertions and deletions. Among SNVs, transitions and transversions are treated separately. Insertions are characterised by a score given to each base flanking the insertion point. Deletions are characterised by a score at every deleted base. FINSURF can (optionally) use a list of known or suspected disease genes, in order to restrict results to variants overlapping cis-regulatory elements linked to these genes.

For a variant of interest, users can generate a graphical representation of "feature contributions », showing the relative contributions of genomic, functional or evolutionary information to its score.

FINSURF is implemented as python3 scripts.

License

This code may be freely distributed and modified under the terms of the GNU General Public License version 3 (GPL v3) and the CeCILL licence version 2 of the CNRS. These licences are contained in the files:

  1. LICENSE-GPL.txt (or on www.gnu.org)
  2. LICENCE-CeCILL.txt (or on www.cecill.info)

Copyright for this code is held by the Dyogen (DYnamic and Organisation of GENomes) team of the Institut de Biologie de l'Ecole Normale Supérieure (IBENS) 46 rue d'Ulm Paris and the individual authors.

Contact

Email finsurf {at} bio {dot} ens {dot} psl {dot} eu

If you use FINSURF, please cite:

Classification of non-coding variants with high pathogenic impact. Lambert Moyon, Camille Berthelot, Alexandra Louis, Nga Thi Thuy Nguyen, Hugues Roest Crollius PLoS Genet. 2022 Apr 29;18(4):e1010191. doi: 10.1371/journal.pgen.1010191.

Quick start

Below is a quick start guide to using FINSURF

Table of content

Installation

Installing conda

The Miniconda3 package management system manages all FINSURF dependencies, including python packages and other software.

To install Miniconda3:

Installing FINSURF

Usage

Setting up your working environment for FINSURF

Before any FINSURF run, you should:

Running FINSURF on example data

Before using FINSURF on your data, we recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.

Example 1: Simple FINSURF run

To run FINSURF on example data:

python scripts/finsurf.py -i static/data/samples/variant.vcf -s static/data/scores_all_chroms_1e-4.tsv.gz -g static/data/FINSURF_REGULATORY_REGIONS_GENES.bed.gz -ig static/data/samples/gene.txt

The following output should be generated: res/result_*.txt.

To run FINSURF on the 49 variants from Genomizer:

python scripts/finsurf.py -i static/data/samples/Genomizer_49_var.vcf -s static/data/scores_all_chroms_1e-4.tsv.gz -g static/data/FINSURF_REGULATORY_REGIONS_GENES.bed.gz -ig static/data/samples/Genomizer_49_var_GENES.tsv

to plot the contributions for one specific variant:

python scripts/plot_contribution.py --variant "chr1:12005" --vartype "transition" --rename_cols_table static/data/FINSURF_model_objects/rename_columns_model.tsv --numFeat_path static/data/NUM_FEATURES.tsv.gz --scaled_numFeat_path static/data/SCALED_NUM_FEATURES.tsv.gz --featCont_transition_path static/data/FULL_FC_transition.tsv.gz --featCont_transversion_path static/data/FULL_FC_transversion.tsv.gz

to plot the contributions for one specific variant from Genomizer dataset:

python scripts/plot_contribution.py --variant "chr8:21988220" --vartype "transition" --rename_cols_table static/data/FINSURF_model_objects/rename_columns_model.tsv --numFeat_path static/data/NUM_FEATURES.tsv.gz --scaled_numFeat_path static/data/SCALED_NUM_FEATURES.tsv.gz --featCont_transition_path static/data/FULL_FC_transition.tsv.gz --featCont_transversion_path static/data/FULL_FC_transversion.tsv.gz

The script should generate the html file in res directory such as this one