A share of all cell-free DNA fragments isolated from maternal plasma during pregnancy is fetal-derived. This amount is referred to as the 'fetal fraction' and represents an important estimate during routine noninvasive prenatal testing (NIPT). Its most essential role is informing geneticists whether an assay is conclusive: if the fetal fraction is insufficient (this limit has often been debated to be 4%) claims on fetal aneuploidies cannot be made accurately. Several techniques exist to deduce this figure, but the far most require additional experimental procedures, which impede routine execution. Therefore, we set out to develop PREFACE, a software to accurately predict fetal fraction based on solely shallow-depth whole-genome sequencing data, which is the fundamental base of a default NIPT assay. In contrast to previous efforts, PREFACE enables user-friendly model training with a limited amount of retrospective data, which eliminates between-laboratory bias. For sets of roughly 1100 male NIPT samples, a cross-validated correlation of 0.9 between predictions and fetal fractions according to Y chromosomal read counts was noted (FFY). Our approach enables training with both male and unlabeled female fetuses: using our complete cohort (nfemale=2468, nmale=2723), the correlation metric reached 0.94. In addition, PREFACE provides the fetal fraction based on the copy number state of chromosome X (FFX). The presented statistics indirectly predict mixed multiple pregnancies, the source of observed events and sex chromosomal aneuploidies. All details can be found in our corresponding paper.
Each sample (whether it is used for training or for predicting) should be passed to PREFACE in the format shown below. During benchmarking, using a bin size of 100 kb (others might work equally well), copy number normalization was performed by WisecondorX, yet PREFACE is not limited to any copy number alteration software, however, the default output of WisecondorX is directly interpretable by PREFACE.
./examples/infile.bed
For training, PREFACE requires a config file.
./examples/config.txt
--femprop
flag is given (see below).
RScript PREFACE.R train --config path/to/config.txt --outdir path/to/dir/ [optional arguments]
Optional argument |
Function |
---|---|
--nfeat x |
Number of principal components to use during modeling. (default: x=50) |
--hidden x |
Number of hidden layers used in neural network. Use with caution. (default: x=2) |
--cpus x |
Use for multiprocessing, number of requested threads. (default: x=1) |
--femprop |
When using FFY as FF (recommended), FF labels for female fetuses are irrelevant, and should be ignored in the supervised learning phase (default). If this behavior is not desired, use this flag, which demands that the given FFs for female fetuses are proportional to their actual FF. |
--olm |
It might be possible the neural network does not converge; or for your kind of data/sample size, an ordinary linear model might be a better option. In these cases, use this flag. |
--noskewcorrect |
This flag ascertains the best fit for most (instead of all) of the data is generated. Mostly not recommended. |
RScript PREFACE.R predict --infile path/to/infile.bed --model path/to/model.RData [optional arguments]
Optional argument |
Function |
---|---|
--json x |
Predictions are written to stdout. Use this flag for json format. Optionally provide 'x' to generate .json file x. |
--nfeat
:
--nfeat
. Two parts should be seen in the proportion of variance across the principal components (indexed in order of importance):
--nfeat
captures the 'random' phase (as shown in the example at ./examples/overall_performance.png
). Capturing too much of the 'non-random' phase could lead to convergence problems during modeling. --nfeat
, re-run with a different number of features. Other versions are of course expected to work equally well. To install within R use:
install.packages(c('data.table', 'glmnet', 'neuralnet', 'foreach', 'doParallel', 'MASS', 'irlba'))