cancerit / BRASS

Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements in whole genome sequencing data.
GNU Affero General Public License v3.0
57 stars 20 forks source link

BRASS

Quay Badge

Master Develop
Master Badge Develop Badge

Breakpoints via assembly

BRASS analyses one or more related BAM files of paired-end sequencing to determine potential rearrangement breakpoints.

There are several stages the main component being:

  1. Collect read-pairs where both ends map but NOT marked as properly-paired.
  2. Perform grouping based on mapped locations
  3. Filter
  4. Run assembly
  5. Annotate with GRASS

Quick installation

./setup.sh path_to_install_to

Skipping all external dependencies

If you want to only install the core of BRASS (C and perl wrappers) and use existing versions of tools from your path run as:

./setup.sh path_to_install_to 1

Skipping exonerate install

Central install via package manager of 2.2.0 is adequate. To skip just exonerate install run:

./setup.sh path_to_install_to 2

Pre-requisites

Perl packages:

Each of these has it's own dependencies.

R packages

A large number of R packages are required to run BRASS. To facilitate the install process there is a script Rsupport/libInstall.R that can be run to build these for you. See this file for the list of packages.

Alternatively you can run:

cd Rsupport
./setupR.sh path_to_install_to

Appending 1 to the command will request a complete local build of R (3.1.3).

Other tools that need to be in path

Tools installed by setup.sh

Please use setup.sh to install these dependencies. Setting the environment variable CGP_PERLLIBS allows you to to append to PERL5LIB during install. Without this all dependancies are installed into the target area. setup.sh will not use PERL5LIB directly.

Please be aware that this expects basic C compilation libraries and tools to be available.

Running BRASS

This package includes a reference implementation which handles all of the linking together of steps.

Please see the -h and -m options of brass.pl for full usage information.

It can be run in a couple of ways:

  1. Fire and forget
    • Execute on a single host with multiple cores (or 1 if that's all you have)
    • Some efficiency overhead as some steps aren't parallel
  2. Farm style
    • Requires 2 extra parameters in the initial command
    • See -help for further details

Input

Initial mapping

BRASS has primarily been written to work with BWA mapped data. You are likely to get the most useful output from BWA-mem.

Library quality

Please be aware that paired-end libraries where properly-paired reads are heavily overlapped are unlikely to produce good results.

Additional mapping information

BRASS requires accurate information regarding the insert size distribution and expects to find a *.bam.bas file co-located with the *.bam's. These can be generated by the bam_stats program included in the PCAP-core project. If you use bwa_mem.pl to map your data (same repository) then this file is generated automatically for you.

Docker, Singularity and Dockstore

There are pre-built images containing this codebase on quay.io.

The docker images are know to work correctly after import into a singularity image.

LICENCE

Copyright (c) 2014-2019 Genome Research Ltd.

Author: CASM/Cancer IT <cgphelp@sanger.ac.uk>

This file is part of BRASS.

BRASS is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads ‘Copyright (c) 2005, 2007-
2009, 2011-2012’ should be interpreted as being identical to a statement that
reads ‘Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012’ and a copyright
statement that reads ‘Copyright (c) 2005-2012’ should be interpreted as being
identical to a statement that reads ‘Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012’."