ExpressionAnalysis / CNV_Radar

CNV Rapid Aberration Detection And Reporting
Other
10 stars 7 forks source link

Error: cannot use more inner knots than unique 'x' values #6

Open bugds opened 4 years ago

bugds commented 4 years ago

Hello!

I tried to perform CNV analysis with CNV_Radar on custom hematological panel; for processing I used GATK 3.8 MarkDuplicates, UnifiedGenotyper and realignment tools, latest Picard, bam2roi.r, and CNV_Radar_create_control.r.

When running CNV_Radar.r I get the following output:

$ Rscript ~/git-clone/CNV_Radar/CNV_Radar.r -C /home/bugds/bugds/MDS/CNV_Radar/normals/CNV_control.RData -S $file -R _roiSummary.txt -O $file.tsv —vcf '.bam.vcf' —filtCommon FALSE
Loading required package: getopt
Error in smooth.spline(x = f$Pos[f$Chr == chr], y = 6 * abs(f$AF[f$Chr == :
cannot use more inner knots than unique 'x' values
Calls: predict -> smooth.spline
Execution halted

I figured that my problem comes from a small amount of SNPs found by UnifiedGenotyper per chromosome (<20, the panel is relatively short), and nknots parameter should be >=20 (CNV_Radar.r, line 168: params$cknots <- pmax(20, round(sum(f$Chr==chr)/params$smooth))).

I got rid of the error by manually changing this line: (CNV_Radar.r, line 170): preds$afv <- predict(smooth.spline(x = f$Pos[ f$Chr==chr], y = 6*abs(f$AF[ f$Chr==chr] - 0.5)^3, nknots = params$cknots, w = f$DP[ f$Chr==chr]), x = preds$Beg)$y

to this: preds$afv <- predict(smooth.spline(x = f$Pos[ f$Chr==chr], y = 6*abs(f$AF[ f$Chr==chr] - 0.5)^3, w = f$DP[ f$Chr==chr]), x = preds$Beg)$y

Can I trust the results? Is this the right thing to do when my data prevents the standard analysis?

Thank you!

jkstrat commented 3 years ago

Hello bugds,

Thanks for pointing this out.

Caveat: We built CNV Radar to address CNV calling on large targeted panels rather than small targeted panels. The validation testing that we performed was on larger targeted panels (whole exome) so I don't have any data to say how well the software performs on really small panels. If you have any samples with known truth that would be something worth looking into further. With that caveat, I can't think of any technical reason why CNV Radar wouldn't work on a small panel, but I can't make any claims about accuracy.

The idea behind setting the knots was to fit the data without overfitting the data. Using the exome data we found that 20 knots provided a reason estimate of the observed values. However for a smaller panel that may not be appropriate (as you saw). When you remove the nknots argument you are having smooth.spline choose the number of knots to use when fitting the spline. The Rdocumentation for smooth.spline states

If FALSE (default), a subset of x[] is used, specifically x[j] where the nknots indices are evenly spaced in 1:n.

So I would encourage you to plot the data points and the modeled smoothed spline. Look to see what the fit is. If it looks like a series of speed bumps then it's probably not appropriate but if it looks like a reasonable approximation then you can proceed.

In the future, instead of just removing the argument you will need to lower the minimum number of knots as below.

v1.1.0 code (line 168 of CNV_Radar.r)

params$cknots <- pmax(<smaller # knots>, round(sum(f$Chr==chr)/params$smooth))).

v1.2.0 code (line 348 of CNV_Radar.r)

iterative_fit$cknots <- pmax(<smaller # knots>, round(sum(f$CHROM==chr)/model_params$smooth)) 

This parameter is probably a good candidate to move into the config file. Perhaps once we know how CNV Radar performs on small panels by evaluating samples with known truth we can address this further.

Regards, Jeran

DarwinsFinch commented 3 years ago

Hello,

I had the same issue as described above. Changing line 348 of CNV_Radar.r to a smaller number of knots (8) resulted in a successful run of the script. Unfortunately, the obtained jpegs (attached) do not show any called CNVs. Further, some of the Chromosomes are missing in the obtained tsv (below), while for the present Chromosmes there is only one entry each.

Chr Start Stop log2FC Qscore ObservedDepth ExpectedDepth Zscore HetVar IsCNV Length IsLOH_Only 1 69040 249212612 -0.488 0.003 1.009 0.743 NA 0.008 FALSE 249143573 FALSE 2 41557 242815476 -0.54 -0.006 0.947 0.755 NA 0.006 FALSE 242773920 FALSE 3 361409 197765588 -0.512 0 0.988 0.749 NA 0.091 FALSE 197404180 TRUE 5 92296 180795276 -0.548 0 0.921 0.755 NA 0.012 FALSE 180702981 FALSE 6 292489 170893719 -0.512 0.001 1.033 0.753 NA 0.014 FALSE 170601231 FALSE 7 193149 158937513 -0.478 0.005 1.051 0.743 NA 0.009 FALSE 158744365 FALSE 8 116035 146279593 -0.486 0.068 1.161 0.756 NA 0.053 FALSE 146163559 TRUE 9 14756 141016501 -0.464 0.01 1.047 0.738 NA 0.065 FALSE 141001746 TRUE 11 193049 134257603 -0.482 -0.005 0.942 0.734 NA 0.022 FALSE 134064555 FALSE 12 175998 133810992 -0.516 -0.001 0.943 0.746 NA 0.008 FALSE 133634995 FALSE 14 19377543 105996181 -0.449 0.115 1.193 0.745 NA 0.037 FALSE 86618639 FALSE 17 5960 81052370 -0.472 -0.002 0.899 0.724 NA 0.01 FALSE 81046411 FALSE 19 110628 59082806 -0.416 0 0.922 0.708 NA 0.027 FALSE 58972179 FALSE 20 68300 62905003 -0.429 0.035 1.093 0.727 NA 0.021 FALSE 62836704 FALSE

Do you have any idea what could be the problem?

AK05-20019-Vulva-Tumor_S3_realigned_sorted_ann iter3_Genome_lfc_vaf

Also, I used exon data as well, so I was wondering why the "cannot use more inner knots than unique 'x' values" is a problem in the first place, since my panel is rather large.

Best regards, Lunas