bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
64 stars 42 forks source link

Request for Help: no points with non-zero weight #24

Closed WeiliiEyedWizard closed 4 years ago

WeiliiEyedWizard commented 5 years ago

I am having some problems running QTLseqR on a data set.

For both #Run QTLseq analysis and #Run Gprime analysis I am getting the following error

Run G' analysis

df_filt <- runGprimeAnalysis( SNPset = df_filt, windowSize = 1e6, outlierFilter = "deltaSNP") Counting SNPs in each window... Calculating tricube smoothed delta SNP index... Calculating G and G' statistics... Using deltaSNP-index to filter outlier regions with a threshold of 0.1 Estimating the mode of a trimmed G prime set using the 'modeest' package... Calculating p-values... There were 50 or more warnings (use warnings() to see the first 50)

Run QTLseq analysis

df_filt <- runQTLseqAnalysis( SNPset = df_filt, windowSize = 1e6, popStruc = "F2:3", bulkSize = c(25, 25), replications = 10000, intervals = c(95, 99) ) Counting SNPs in each window... Calculating tricube smoothed delta SNP index... Returning the following two sided confidence intervals: 95, 99 Variable 'depth' not defined, using min and max depth from data: 40-191 Assuming bulks selected from RIL population, with 2525 individuals per bulk. Simulating 10000 SNPs with reads at each depth: 40-191 Keeping SNPs with >= 0.3 SNP-index in both simulated bulks Joining, by = "tricubeDP" There were 50 or more warnings (use warnings() to see the first 50)

warnings gives me

warnings() Warning messages: 1: In lfproc(x, y, weights = weights, cens = cens, base = base, ... : procv: no points with non-zero weight

repeated 50 times.

When running the same script with the Yang dataset from the vignette example everything works fine, however once I try on my data it does not seem to work.

Comparing the df_filt variables from the yang data set and my own visually lead to me concluding there was no differences is format.

Here is my df_filt

head(df_filt) CHROM POS REF ALT AD_REF.LOW AD_ALT.LOW DP.LOW GQ.LOW PL.LOW 1 Chr1 154894 T C 19 57 76 99 2418,0,594 2 Chr1 676832 A G 23 42 65 99 1690,0,872 3 Chr1 676847 T G 23 43 66 99 1711,0,866 4 Chr1 676883 A G 23 43 66 99 1736,0,944 5 Chr1 676884 T G 23 43 66 99 1736,0,944 6 Chr1 868399 A G 23 48 71 99 1915,0,1126 SNPindex.LOW AD_REF.HIGH AD_ALT.HIGH DP.HIGH GQ.HIGH PL.HIGH 1 0.7500000 21 60 81 99 2485,0,740 2 0.6461538 19 40 59 99 1461,0,654 3 0.6515152 15 40 55 99 1480,0,490 4 0.6515152 21 40 61 99 1616,0,886 5 0.6515152 21 40 61 99 1616,0,886 6 0.6760563 15 78 93 99 3167,0,703 SNPindex.HIGH REF_FRQ deltaSNP nSNPs tricubeDeltaSNP G 1 0.7407407 0.2547771 -0.009259259 1 0.04474525 0.017710620 2 0.6779661 0.3387097 0.031812256 8 0.07020005 0.139879722 3 0.7272727 0.3140496 0.075757576 8 0.07020079 0.803952922 4 0.6557377 0.3464567 0.004222553 8 0.07020254 0.002496501 5 0.6557377 0.3464567 0.004222553 8 0.07020259 0.002496501 6 0.8387097 0.2317073 0.162653339 8 0.05748516 5.948665346 Gprime pvalue negLog10Pval qvalue minDP tricubeDP CI_95 CI_99 1 1.156295 0.4455630 0.3510909 0.7744778 76 64 -0.328125 -0.40625 2 2.068951 0.2106250 0.6764902 0.6470770 59 64 -0.328125 -0.40625 3 2.068978 0.2106208 0.6764988 0.6470770 55 64 -0.328125 -0.40625 4 2.069040 0.2106107 0.6765196 0.6470770 61 64 -0.328125 -0.40625 5 2.069042 0.2106104 0.6765202 0.6470770 61 64 -0.328125 -0.40625 6 2.214282 0.1888550 0.7238714 0.6470770 71 64 -0.328125 -0.40625

as well as the R script I am using, which is modified from your documentations example.

load the package

library("QTLseqr")

Set sample and file names

HighBulk <- "egusi_bulk" LowBulk <- "normal_bulk" file <- "normal_egusi_SNPs_forR2.table"

Choose which chromosomes will be included in the analysis (i.e. exclude smaller contigs)

Chroms <- paste0(rep("Chr", 11), 1:11)

Import SNP data from file

df <- importFromGATK( file = file, highBulk = HighBulk, lowBulk = LowBulk, chromList = Chroms )

Filter SNPs based on some criteria

df_filt <- filterSNPs( SNPset = df, refAlleleFreq = 0.20, minTotalDepth = 100, maxTotalDepth = 400, minSampleDepth = 40, minGQ = 99 )

Run G' analysis

df_filt <- runGprimeAnalysis( SNPset = df_filt, windowSize = 1e6, outlierFilter = "deltaSNP")

Run QTLseq analysis

df_filt <- runQTLseqAnalysis( SNPset = df_filt, windowSize = 1e6, popStruc = "F2:3", bulkSize = c(25, 25), replications = 10000, intervals = c(95, 99) )

Plot

plotQTLStats(SNPset = df_filt, var = "Gprime", plotThreshold = TRUE, q = 0.01) plotQTLStats(SNPset = df_filt, var = "deltaSNP", plotIntervals = TRUE)

export summary CSV

getQTLTable(SNPset = df_filt, alpha = 0.01, export = TRUE, fileName = "egusi_QTL.csv")

Any help you could give would be much appreciated. I have not been able to find any answers online.

Thanks!

bmansfeld commented 5 years ago

Hi, Thanks for the detailed question. Note that this isn't an error but a warning. The warnings come from the locfit function which is in charge of the smoothing of the data (behind the scenes of both the runQTLseq and runGprime functions). What the warning is telling you, albeit not in a human readable way, is that there are windows with not enough data within them so it can't calculate a value for those windows. there are 2-3 ways to go about this: 1) Ignore it if the figures look ok to you. 2) Find the SNPs with the least number of neighbors in a window (reported by the nSNPs column). 3) Increase the window size to make sure that you don't get any such windows.

Hope this helps, Ben