CenterForMedicalGeneticsGhent / PREFACE

PREFACE -- PREdict FetAl ComponEnt
GNU General Public License v3.0
14 stars 5 forks source link

Background

A share of all cell-free DNA fragments isolated from maternal plasma during pregnancy is fetal-derived. This amount is referred to as the 'fetal fraction' and represents an important estimate during routine noninvasive prenatal testing (NIPT). Its most essential role is informing geneticists whether an assay is conclusive: if the fetal fraction is insufficient (this limit has often been debated to be 4%) claims on fetal aneuploidies cannot be made accurately. Several techniques exist to deduce this figure, but the far most require additional experimental procedures, which impede routine execution. Therefore, we set out to develop PREFACE, a software to accurately predict fetal fraction based on solely shallow-depth whole-genome sequencing data, which is the fundamental base of a default NIPT assay. In contrast to previous efforts, PREFACE enables user-friendly model training with a limited amount of retrospective data, which eliminates between-laboratory bias. For sets of roughly 1100 male NIPT samples, a cross-validated correlation of 0.9 between predictions and fetal fractions according to Y chromosomal read counts was noted (FFY). Our approach enables training with both male and unlabeled female fetuses: using our complete cohort (nfemale=2468, nmale=2723), the correlation metric reached 0.94. In addition, PREFACE provides the fetal fraction based on the copy number state of chromosome X (FFX). The presented statistics indirectly predict mixed multiple pregnancies, the source of observed events and sex chromosomal aneuploidies. All details can be found in our corresponding paper.

Manual

Required files

Copy number alteration .bed files

Each sample (whether it is used for training or for predicting) should be passed to PREFACE in the format shown below. During benchmarking, using a bin size of 100 kb (others might work equally well), copy number normalization was performed by WisecondorX, yet PREFACE is not limited to any copy number alteration software, however, the default output of WisecondorX is directly interpretable by PREFACE.

PREFACE's config.txt

For training, PREFACE requires a config file.

Model training


RScript PREFACE.R train --config path/to/config.txt --outdir path/to/dir/ [optional arguments]  

Optional argument

Function
--nfeat x Number of principal components to use during modeling. (default: x=50)
--hidden x Number of hidden layers used in neural network. Use with caution. (default: x=2)
--cpus x Use for multiprocessing, number of requested threads. (default: x=1)
--femprop When using FFY as FF (recommended), FF labels for female fetuses are irrelevant, and should be ignored in the supervised learning phase (default). If this behavior is not desired, use this flag, which demands that the given FFs for female fetuses are proportional to their actual FF.
--olm It might be possible the neural network does not converge; or for your kind of data/sample size, an ordinary linear model might be a better option. In these cases, use this flag.
--noskewcorrect This flag ascertains the best fit for most (instead of all) of the data is generated. Mostly not recommended.

Predicting


RScript PREFACE.R predict --infile path/to/infile.bed --model path/to/model.RData [optional arguments]  

Optional argument

Function
--json x Predictions are written to stdout. Use this flag for json format. Optionally provide 'x' to generate .json file x.

Model optimization

Required R packages

Other versions are of course expected to work equally well. To install within R use:


install.packages(c('data.table', 'glmnet', 'neuralnet', 'foreach', 'doParallel', 'MASS', 'irlba'))