Dedicated caller for DUX4 rearrangements from whole genome sequencing data.
Pelops is based on a method first described in:
Ryan, S.L., Peden, J.F., Kingsbury, Z. et al. Whole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia. Leukemia 37, 518–528 (2023). https://doi.org/10.1038/s41375-022-01806-8
Pelops itself is described and validated in:
Grobecker, P., Mijuskovic, M., et al. Pelops: A dedicated caller for DUX4 rearrangements from short-read whole genome sequencing data. In preparation (2024)
You can install the latest stable released version of Pelops using pip
pip install ilmn-pelops --upgrade
Note: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e. 0.8.0b1). If you need to install a beta version for testing, please first uninstall your current version of pelops.
pip uninstall ilmn-pelops
Pelops is a tool with a command line interface (cli). Discover its usage with
pelops --help
To call DUX4-rearrangements from a BAM/CRAM file, use the dux4r
subcommand. To
see all available options run
pelops dux4r --help
The input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample, aligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed. Pelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).
To increase specificity when calling non-IGH DUX4-rearrangements, we recommend using a systematic noise BED file. This file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a DUX4-rearrangement. Since such regions can be specific to the read alignment tool, reference genome, sequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file for each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are guaranteed to have no DUX4-rearrangements, and generate a list of false-positive calls.
Pelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.
The top level of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command). It also contains information about the input file (number of unique and mapped reads - which can be a user input). Finally, it contains a list of rearrangements investigated by pelops.
{
"reference": "GRCh38",
"unique_mapped_reads": 1000000000,
"rearrangements": [...],
"program_name": "pelops",
"version": "0.5.0",
"cli_command": "pelops dux4r --total-number-reads 1000000000 --export . test.bam"
}
The rearrangements consist of a unique ID, genomic region sets "A" and "B", and the evidence for the rearrangement
between these two regions.
For the command pelops dux4r
, ID 01
always corresponds to rearrangements between the core DUX4 regions and IGH,
while ID 02
corresponds to rearrangements of the extended DUX4 regions with IGH.
IDs 03
and beyond are potential rearrangements of the core DUX4 region with other genomic regions (marked as UNNAMED
);
there can be a variable number of them.
{
"rearrangements": [
{
"id": "01",
"A": {"name": "CoreDUX4"...},
"B": {"name": "IGH"...},
"evidence": {...}
},
{
"id": "02",
"A": {"name": "ExtendedDUX4"...},
"B": {"name": "IGH"...},
"evidence": {...}
},
{
"id": "03",
"A": {"name": "CoreDUX4"...},
"B": {"name": "UNNAMED"...},
"evidence": {...}
}
]
}
A and B document the exact set of genomic regions used for each rearrangement.
For example, the core DUX4 region is shown below.
While IGH
, CoreDUX4
and ExtendedDUX4
are pre-defined, each UNNAMED
region will be different.
{
"name": "CoreDUX4",
"regions": [
{
"chrom": "chr4",
"start": 190020407,
"end": 190023665
},
{
"chrom": "chr4",
"start": 190066935,
"end": 190093279
},
{
"chrom": "chr4",
"start": 190172774,
"end": 190176845
},
{
"chrom": "chr10",
"start": 133663429,
"end": 133685936
},
{
"chrom": "chr10",
"start": 133739606,
"end": 133762125
}
]
}
The evidence for each rearrangement consists of the number of split and paired reads between region sets A and B, and the spanning read pairs per billion (SRPB). It is calculated as $$\text{SRPB} = 10^9 \frac{\text{paired reads} + \text{split reads}}{\text{total unique and mapped reads}}.$$
{
"paired_reads": 15,
"split_reads": 4,
"SRPB": 19.0
}
Optionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates.
The naming convention is <id>_<name_A>-<name_B>.sam
, where <id>
, <name_A>
, <name_B>
correspond to the ID and
names of genomic region sets A and B, respectively, as documented in the JSON.
We are not accepting pull requests into this repository at this time, as the licence currently does not allow modifications by third parties. For any bug report / recommendation / feature request, please open an issue.
See Authors.