Closed AdrijaK closed 4 years ago
Dear Adrija,
Thanks for reaching out and for providing this code example.
While the code did actually run for me, I should point out that the 5 distance bins (for short-range interactions, medium-range interactions, long-range interactions, etc., see also ?bin_interactions) created here are rather large and, past a certain distance, don't contain much information in the way of readcount (N) variation.
For example, the 4th and 5th bin contain extremely long-range putative interactions (>~10Mbp) in which N is essentially always 1 (max. 3 across 349213 interactions in each). You can check this by inspecting BI$bins
and BI$interactions
. The lack of readcount variation here makes it hard to confidently predict how factors such as distance or fragment length influence N in a null or background noise model, which peaky attempts to construct here with model_bin()
.
There is a simple solution, which is to limit the range of interactions being analyzed to a range where you have considerable variation in N, for example up to 5 Mbp. This is achieved by passing max_dist=5e6
to bin_interactions, like so: BI = bin_interactions(interactions, fragments, bins=5, max_dist=5e6)
. The rest of the code can remain unmodified and should run.
Please do let me know if this resolves your issue and, if so, whether you require any additional help with your analysis or QC after setting this distance cut-off.
Best wishes,
Chris
Dear Chris,
thank you so much for the help - it now works "sometimes" - when I rerun exactly the same command twice, it works second time but not the first time, which is weird:
BI = bin_interactions(interactions, fragments, bins=5, , max_dist=5e6)
04-05-2020 11:38:12 Calculating fragment characteristics... 04-05-2020 11:38:12 Adding fragment characteristics for baits... 04-05-2020 11:38:12 Adding fragment characteristics for preys... 04-05-2020 11:38:13 Calculating interaction distances... 04-05-2020 11:38:13 Calculating total trans-chromosomal read counts for each bait... 04-05-2020 11:38:13 Modelling those as a function of bait chromosome... GAMLSS-RS iteration 1: Global Deviance = 10178 GAMLSS-RS iteration 2: Global Deviance = 10178 04-05-2020 11:38:13 Adding trans-chromosomal interactivity covariate for preys that were also baited (0 for preys not baited)... 04-05-2020 11:38:13 Excluding 42 interactions that are too proximal (distance < 2500 bp)... 04-05-2020 11:38:13 Excluding 931403 interactions that are too distal (distance > 5000000 bp)... 04-05-2020 11:38:15 Assigning 5 distance bins... 04-05-2020 11:38:15 Done.
Trying to make the models for the first time:
models = by(BI$interactions, BI$interactions$dist.bin, model_bin, subsample_size=1000)
A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:26 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:26 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 4218 GAMLSS-RS iteration 2: Global Deviance = 4117 GAMLSS-RS iteration 3: Global Deviance = 4097 GAMLSS-RS iteration 4: Global Deviance = 4096 GAMLSS-RS iteration 5: Global Deviance = 4096 04-05-2020 11:38:27 Converged: YES Iterations: 5Coefficients: (Intercept) 8.60382407701117 log(abs(dist)) -0.688790254667649 b.trans_res 0.681342552769714 p.trans_res 0.533901819099802 sqrt(b.length) 0.00274102717117741 sqrt(p.length) 0.00760604413123601 04-05-2020 11:38:27 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:27 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:27 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 1908 GAMLSS-RS iteration 2: Global Deviance = 1898 GAMLSS-RS iteration 3: Global Deviance = 1895 GAMLSS-RS iteration 4: Global Deviance = 1894 GAMLSS-RS iteration 5: Global Deviance = 1893 GAMLSS-RS iteration 6: Global Deviance = 1893 GAMLSS-RS iteration 7: Global Deviance = 1893 GAMLSS-RS iteration 8: Global Deviance = 1893 GAMLSS-RS iteration 9: Global Deviance = 1893 04-05-2020 11:38:28 Converged: YES Iterations: 9Coefficients: (Intercept) 25.5742019055054 log(abs(dist)) -2.04357498805819 b.trans_res 0.900281022998453 p.trans_res 0.5926937310439 sqrt(b.length) 0.00276532259479082 sqrt(p.length) 0.0109130262056804 04-05-2020 11:38:28 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:29 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:29 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 889 GAMLSS-RS iteration 2: Global Deviance = 887.1 GAMLSS-RS iteration 3: Global Deviance = 885.9 GAMLSS-RS iteration 4: Global Deviance = 885 GAMLSS-RS iteration 5: Global Deviance = 884.4 GAMLSS-RS iteration 6: Global Deviance = 883.9 GAMLSS-RS iteration 7: Global Deviance = 883.5 GAMLSS-RS iteration 8: Global Deviance = 883.2 GAMLSS-RS iteration 9: Global Deviance = 882.9 GAMLSS-RS iteration 10: Global Deviance = 882.7 GAMLSS-RS iteration 11: Global Deviance = 882.5 GAMLSS-RS iteration 12: Global Deviance = 882.4 GAMLSS-RS iteration 13: Global Deviance = 882.3 GAMLSS-RS iteration 14: Global Deviance = 882.1 GAMLSS-RS iteration 15: Global Deviance = 882 GAMLSS-RS iteration 16: Global Deviance = 881.9 04-05-2020 11:38:30 Converged: YES Iterations: 16Coefficients: (Intercept) 29.5561400153415 log(abs(dist)) -2.30167604983013 b.trans_res 0.728377537348226 p.trans_res 1.14835336139614 sqrt(b.length) -0.00188225808431261 sqrt(p.length) 0.0114780138939135 04-05-2020 11:38:30 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:31 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:31 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 457.6 GAMLSS-RS iteration 2: Global Deviance = 456.5 GAMLSS-RS iteration 3: Global Deviance = 455.9 GAMLSS-RS iteration 4: Global Deviance = 455.5 GAMLSS-RS iteration 5: Global Deviance = 455.3 GAMLSS-RS iteration 6: Global Deviance = 455 GAMLSS-RS iteration 7: Global Deviance = 454.9 GAMLSS-RS iteration 8: Global Deviance = 454.7 GAMLSS-RS iteration 9: Global Deviance = 454.6 GAMLSS-RS iteration 10: Global Deviance = 454.5 GAMLSS-RS iteration 11: Global Deviance = 454.4 04-05-2020 11:38:31 Converged: YES Iterations: 11Coefficients: (Intercept) 14.7174199763078 log(abs(dist)) -1.314586905212 b.trans_res 1.07335554666915 p.trans_res 0.456230134410514 sqrt(b.length) 0.0119716545962497 sqrt(p.length) 0.00289499886899122 04-05-2020 11:38:31 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:32 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:32 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 183.3 Fejl i glim.fit(f = sigma.object, X = sigma.X, y = y, w = w, fv = sigma, : NA's in the working vector or weights for parameter sigma
models = by(BI$interactions, BI$interactions$dist.bin, model_bin, subsample_size=1000)
A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:42 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:42 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 4112 GAMLSS-RS iteration 2: Global Deviance = 4054 GAMLSS-RS iteration 3: Global Deviance = 4046 GAMLSS-RS iteration 4: Global Deviance = 4046 GAMLSS-RS iteration 5: Global Deviance = 4046 04-05-2020 11:38:42 Converged: YES Iterations: 5Coefficients: (Intercept) 8.60703673135044 log(abs(dist)) -0.702747340416832 b.trans_res 0.626677275355768 p.trans_res 0.285457257882969 sqrt(b.length) 0.00327133805353063 sqrt(p.length) 0.00994189591497322 04-05-2020 11:38:42 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:43 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:43 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 2062 GAMLSS-RS iteration 2: Global Deviance = 2054 GAMLSS-RS iteration 3: Global Deviance = 2051 GAMLSS-RS iteration 4: Global Deviance = 2050 GAMLSS-RS iteration 5: Global Deviance = 2050 GAMLSS-RS iteration 6: Global Deviance = 2049 GAMLSS-RS iteration 7: Global Deviance = 2049 GAMLSS-RS iteration 8: Global Deviance = 2049 04-05-2020 11:38:44 Converged: YES Iterations: 8Coefficients: (Intercept) 27.80849733567 log(abs(dist)) -2.19780187967553 b.trans_res 0.880405180967756 p.trans_res 0.867430127469399 sqrt(b.length) 0.00325193983127802 sqrt(p.length) 0.0103974367428325 04-05-2020 11:38:44 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:44 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:44 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 902.4 GAMLSS-RS iteration 2: Global Deviance = 900.9 GAMLSS-RS iteration 3: Global Deviance = 899.8 GAMLSS-RS iteration 4: Global Deviance = 899.1 GAMLSS-RS iteration 5: Global Deviance = 898.5 GAMLSS-RS iteration 6: Global Deviance = 898.1 GAMLSS-RS iteration 7: Global Deviance = 897.8 GAMLSS-RS iteration 8: Global Deviance = 897.5 GAMLSS-RS iteration 9: Global Deviance = 897.2 GAMLSS-RS iteration 10: Global Deviance = 897 GAMLSS-RS iteration 11: Global Deviance = 896.8 GAMLSS-RS iteration 12: Global Deviance = 896.7 GAMLSS-RS iteration 13: Global Deviance = 896.5 GAMLSS-RS iteration 14: Global Deviance = 896.4 GAMLSS-RS iteration 15: Global Deviance = 896.3 GAMLSS-RS iteration 16: Global Deviance = 896.2 GAMLSS-RS iteration 17: Global Deviance = 896.1 04-05-2020 11:38:45 Converged: YES Iterations: 17Coefficients: (Intercept) 27.8539188410451 log(abs(dist)) -2.18615657570701 b.trans_res 0.672795616755087 p.trans_res -0.0290783029897536 sqrt(b.length) 0.00042526419557808 sqrt(p.length) 0.0107030480820055 04-05-2020 11:38:45 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:46 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:46 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 361.2 GAMLSS-RS iteration 2: Global Deviance = 360.7 GAMLSS-RS iteration 3: Global Deviance = 360.6 GAMLSS-RS iteration 4: Global Deviance = 360.5 04-05-2020 11:38:46 Converged: YES Iterations: 4Coefficients: (Intercept) 7.75038712035162 log(abs(dist)) -0.806404110612846 b.trans_res 0.993297326313753 p.trans_res 1.24852752510704 sqrt(b.length) 0.00370296550733702 sqrt(p.length) 0.00754233340366996 04-05-2020 11:38:46 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 11:38:47 Subsampling 1000/162932 interactions for the null model regression... 04-05-2020 11:38:47 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 115.7 GAMLSS-RS iteration 2: Global Deviance = 115.7 04-05-2020 11:38:47 Converged: YES Iterations: 2Coefficients: (Intercept) 45.1630363160161 log(abs(dist)) -3.36940954335957 b.trans_res 0.853522042717577 p.trans_res 1.30406337698399 sqrt(b.length) -0.00124465366664082 sqrt(p.length) 0.0211871469993829 04-05-2020 11:38:47 Obtaining normalized randomized quantile residuals for the full dataset...
And it seems that when I rerun exactly the same command several times, I can get different error messages:
This one:
Fejl i if (dv > olddv && itn >= 2 && auto == TRUE) { : manglende værdi hvor TRUE/FALSE er krævet (Error in if (dv > olddv && itn >= 2 && auto == TRUE) {: missing value where TRUE/FALSE needed)
and this one:
Fejl i glim.fit(f = sigma.object, X = sigma.X, y = y, w = w, fv = sigma, : NA's in the working vector or weights for parameter sigma
and somethimes there are no error messages, so I can obtain the models.
Also, when I try to apply the model_bin
to my own chicago dataset, I get the same error:
error in if (dv > olddv && itn >= 2 && auto == TRUE) { : manglende værdi hvor TRUE/FALSE er krævet In addition: Warnings: 1: In predict.gamlss(fit, what = "sigma", newdata = bin, type = "response", : There is a discrepancy between the original and the re-fit used to achieve 'safe' predictions 2: In runif(length(y), aval, bval) : NAs produced
details:
# read .Rds dataset with chicago object
cd = readRDS("path-to-chicago/chicago-interactions-replicates/data/treatment1.Rds")
# convert chicago dataset to inputs for Peaky
fragments = cd_to_fragments(cd)
interactions = cd_to_interactions(cd)
# make bin interactions estimate using default values:
BI = bin_interactions(interactions, fragments, bins=5)
# check whether the appropriate distance cutoffs were applied for bins
print(BI$bins) %>% knitr::kable
dist.bin | dist.abs.min | dist.abs.max | interactions |
---|---|---|---|
1 | 2500 | 3524230 | 27969808 |
2 | 3524231 | 11363852 | 27969803 |
3 | 11363853 | 22012823 | 27969804 |
4 | 22012824 | 38375590 | 27969807 |
5 | 38375592 | 191806166 | 27969801 |
NA | NA | NA | 79309364 |
# check that 'distant' bins contain enough of variance in read counts to make accurate models
BI$interactions %>% subset (dist.bin == 5) %>% select (N) %>% table %> t
. | Freq |
---|---|
1 | 27939622 |
2 | 28435 |
3 | 1208 |
4 | 315 |
5 | 120 |
6 | 41 |
7 | 26 |
8 | 13 |
9 | 9 |
10 | 6 |
11 | 1 |
12 | 1 |
14 | 2 |
17 | 2 |
It looks like defaults are good, as there is enough of variation in the "far" bins.
# generate models that predict readscounts under the null
models = by(BI$interactions, BI$interactions$dist.bin, model_bin, subsample_size=1000)
A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 14:07:57 Subsampling 1000/27969808 interactions for the null model regression... 04-05-2020 14:08:02 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 2248 GAMLSS-RS iteration 2: Global Deviance = 2246 GAMLSS-RS iteration 3: Global Deviance = 2245 GAMLSS-RS iteration 4: Global Deviance = 2244 GAMLSS-RS iteration 5: Global Deviance = 2242 GAMLSS-RS iteration 6: Global Deviance = 2241 GAMLSS-RS iteration 7: Global Deviance = 2239 GAMLSS-RS iteration 8: Global Deviance = 2237 GAMLSS-RS iteration 9: Global Deviance = 2235 GAMLSS-RS iteration 10: Global Deviance = 2233 GAMLSS-RS iteration 11: Global Deviance = 2231 GAMLSS-RS iteration 12: Global Deviance = 2229 GAMLSS-RS iteration 13: Global Deviance = 2226 GAMLSS-RS iteration 14: Global Deviance = 2223 GAMLSS-RS iteration 15: Global Deviance = 2221 GAMLSS-RS iteration 16: Global Deviance = 2218 GAMLSS-RS iteration 17: Global Deviance = 2216 GAMLSS-RS iteration 18: Global Deviance = 2213 GAMLSS-RS iteration 19: Global Deviance = 2211 GAMLSS-RS iteration 20: Global Deviance = 2210 GAMLSS-RS iteration 21: Global Deviance = 2209 GAMLSS-RS iteration 22: Global Deviance = 2208 GAMLSS-RS iteration 23: Global Deviance = 2208 GAMLSS-RS iteration 24: Global Deviance = 2208 GAMLSS-RS iteration 25: Global Deviance = 2208 GAMLSS-RS iteration 26: Global Deviance = 2207 04-05-2020 14:08:04 Converged: YES Iterations: 26Coefficients: (Intercept) 19.6064183745344 log(abs(dist)) -1.51010624747966 b.trans_res 0.757658405857431 p.trans_res 0.172985737499321 sqrt(b.length) 0.00290902818617044 sqrt(p.length) 0.00398335443326356 04-05-2020 14:08:04 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 14:10:12 Subsampling 1000/27969803 interactions for the null model regression... 04-05-2020 14:10:18 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 169.7 GAMLSS-RS iteration 2: Global Deviance = 169.7 04-05-2020 14:10:18 Converged: YES Iterations: 2Coefficients: (Intercept) -1.59870896896777 log(abs(dist)) -0.0939608940956977 b.trans_res 0.536016151651486 p.trans_res 1.45616865892802 sqrt(b.length) -0.00521342620335626 sqrt(p.length) -0.00678724593132638 04-05-2020 14:10:18 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 14:12:01 Subsampling 1000/27969804 interactions for the null model regression... 04-05-2020 14:12:07 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 124.7 GAMLSS-RS iteration 2: Global Deviance = 125.1 GAMLSS-RS iteration 3: Global Deviance = 125.1 04-05-2020 14:12:07 Converged: YES Iterations: 3Coefficients: (Intercept) -19.3708919539192 log(abs(dist)) 0.972301563602799 b.trans_res 0.467287876570537 p.trans_res -1.0324932187834 sqrt(b.length) 0.00133351684925376 sqrt(p.length) -0.0167804416369252 04-05-2020 14:12:07 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 14:13:48 Subsampling 1000/27969807 interactions for the null model regression... 04-05-2020 14:13:54 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 54.71 GAMLSS-RS iteration 2: Global Deviance = 54.71 04-05-2020 14:13:54 Converged: YES Iterations: 2Coefficients: (Intercept) 1.85425986574039 log(abs(dist)) -0.420852042942909 b.trans_res 0.816141148020386 p.trans_res -1.29733446996439 sqrt(b.length) -0.0223976759945811 sqrt(p.length) 0.0272093896453706 04-05-2020 14:13:54 Obtaining normalized randomized quantile residuals for the full dataset... A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 14:15:37 Subsampling 1000/27969801 interactions for the null model regression... 04-05-2020 14:15:43 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length)Fejl i if (dv > olddv && itn >= 2 && auto == TRUE) { : manglende værdi hvor TRUE/FALSE er krævet In addition: Advarselsbeskeder: 1: I predict.gamlss(fit, what = "sigma", newdata = bin, type = "response", : There is a discrepancy between the original and the re-fit used to achieve 'safe' predictions
2: I runif(length(y), aval, bval) : NAs produced
sessionInfo:
sessionInfo()
R version 3.6.0 (2019-04-26) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: RHEL Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so locale: [1] LC_CTYPE=da_DK.UTF-8 LC_NUMERIC=C LC_TIME=da_DK.UTF-8 LC_COLLATE=da_DK.UTF-8
[5] LC_MONETARY=da_DK.UTF-8 LC_MESSAGES=da_DK.UTF-8 LC_PAPER=da_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=da_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel splines stats graphics grDevices utils datasets methods base
other attached packages: [1] ggplot2_3.3.0 PCHiCdata_1.14.0 dplyr_0.8.5
[4] tidyr_1.0.2 GenomicInteractions_1.20.3 InteractionSet_1.14.0
[7] SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1
[10] matrixStats_0.56.0 Biobase_2.46.0 GenomicRanges_1.38.0
[13] GenomeInfoDb_1.22.1 IRanges_2.20.2 S4Vectors_0.24.4
[16] BiocGenerics_0.32.0 Chicago_1.14.0 data.table_1.12.8
[19] peaky_0.0.0.9001 R2BGLiMS_0.1-26-03-2017 gamlss.tr_5.1-0
[22] gamlss_5.1-6 nlme_3.1-139 gamlss.data_5.1-4
[25] gamlss.dist_5.1-6 MASS_7.3-51.6 devtools_2.3.0
[28] usethis_1.6.1
loaded via a namespace (and not attached): [1] backports_1.1.6 Hmisc_4.4-0 BiocFileCache_1.10.2 igraph_1.2.5
[5] lazyeval_0.2.2 Delaporte_7.0.2 digest_0.6.25 ensembldb_2.10.2
[9] htmltools_0.4.0 fansi_0.4.1 magrittr_1.5 checkmate_2.0.0
[13] memoise_1.1.0 BSgenome_1.54.0 cluster_2.0.8 remotes_2.1.1
[17] Biostrings_2.54.0 askpass_1.1 prettyunits_1.1.1 jpeg_0.1-8.1
[21] colorspace_1.4-1 blob_1.2.1 rappdirs_0.3.1 xfun_0.13
[25] callr_3.4.3 crayon_1.3.4 RCurl_1.98-1.2 survival_3.1-12
[29] VariantAnnotation_1.32.0 glue_1.4.0 gtable_0.3.0 zlibbioc_1.32.0
[33] XVector_0.26.0 pkgbuild_1.0.7 scales_1.1.0 futile.options_1.0.1
[37] DBI_1.1.0 Rcpp_1.0.4.6 progress_1.2.2 htmlTable_1.13.3
[41] foreign_0.8-71 bit_1.1-15.2 Formula_1.2-3 htmlwidgets_1.5.1
[45] httr_1.4.1 RColorBrewer_1.1-2 acepack_1.4.1 ellipsis_0.3.0
[49] farver_2.0.3 pkgconfig_2.0.3 XML_3.99-0.3 Gviz_1.30.3
[53] nnet_7.3-12 dbplyr_1.4.3 labeling_0.3 tidyselect_1.0.0
[57] rlang_0.4.6 AnnotationDbi_1.48.0 munsell_0.5.0 tools_3.6.0
[61] cli_2.0.2 RSQLite_2.2.0 evaluate_0.14 stringr_1.4.0
[65] processx_3.4.2 knitr_1.28 bit64_0.9-7 fs_1.4.1
[69] purrr_0.3.4 AnnotationFilter_1.10.0 formatR_1.7 biomaRt_2.42.1
[73] compiler_3.6.0 rstudioapi_0.11 curl_4.3 png_0.1-7
[77] testthat_2.3.2 tibble_3.0.1 stringi_1.4.6 highr_0.8
[81] ps_1.3.2 futile.logger_1.4.3 GenomicFeatures_1.38.2 desc_1.2.0
[85] lattice_0.20-38 ProtGenerics_1.18.0 Matrix_1.2-17 vctrs_0.2.4
[89] pillar_1.4.3 lifecycle_0.2.0 BiocManager_1.30.10 bitops_1.0-6
[93] rtracklayer_1.46.0 R6_2.4.1 latticeExtra_0.6-29 gridExtra_2.3
[97] sessioninfo_1.1.1 lambda.r_1.2.4 dichromat_2.0-0 assertthat_0.2.1
[101] pkgload_1.0.2 openssl_1.4.1 rprojroot_1.3-2 withr_2.2.0
[105] GenomicAlignments_1.22.1 Rsamtools_2.2.3 GenomeInfoDbData_1.2.2 hms_0.5.3
[109] grid_3.6.0 rpart_4.1-15 biovizBase_1.34.1 base64enc_0.1-3
Then I try to break down the model fitting bin by bin, and it looks like the distant bins are more difficult:
Bin 1:
BM1 = model_bin(BI$interactions[dist.bin==1,], subsample_size=1000)
plot(BM1$fit)
******************************************************************
Summary of the Randomised Quantile Residuals
mean = -0.0231
variance = 0.984
coef. of skewness = 0.1268
coef. of kurtosis = 3.526
Filliben correlation coefficient = 0.9978
******************************************************************
Bin 2:
BM2 = model_bin(BI$interactions[dist.bin==2,], subsample_size=1000)
******************************************************************
Summary of the Randomised Quantile Residuals
mean = 0.008282
variance = 1.007
coef. of skewness = -0.03388
coef. of kurtosis = 3.118
Filliben correlation coefficient = 0.9992
******************************************************************
And bin3:
BM3 = model_bin(BI$interactions[dist.bin==3,], subsample_size=2000)
plot(BM3$fit)
******************************************************************
Summary of the Randomised Quantile Residuals
mean = -0.02577
variance = 1.016
coef. of skewness = -0.08276
coef. of kurtosis = 2.994
Filliben correlation coefficient = 0.999
******************************************************************
bin 4:
BM4 = model_bin(BI$interactions[dist.bin==4,], subsample_size=1000)
plot(BM4$fit)
******************************************************************
Summary of the Randomised Quantile Residuals
mean = 0.03948
variance = 0.9937
coef. of skewness = 0.02836
coef. of kurtosis = 2.829
Filliben correlation coefficient = 0.9989
******************************************************************
BM5 keeps failing:
BM5 = model_bin(BI$interactions[dist.bin==5,], subsample_size=1000)
First attempt to run it:
A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 15:53:24 Subsampling 1000/27969801 interactions for the null model regression... 04-05-2020 15:53:33 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) GAMLSS-RS iteration 1: Global Deviance = 55.43 GAMLSS-RS iteration 2: Global Deviance = 1207762 GAMLSS-RS iteration 3: Global Deviance = 1204146 GAMLSS-RS iteration 4: Global Deviance = 1202087 GAMLSS-RS iteration 5: Global Deviance = 1198928 GAMLSS-RS iteration 6: Global Deviance = 1195755 GAMLSS-RS iteration 7: Global Deviance = 1188371Fejl i glim.fit(f = mu.object, X = mu.X, y = y, w = w, fv = mu, os = mu.offset, : NA's in the working vector or weights for parameter mu
Second attempt to run it:
A truncated family of distributions from NBI has been generated and saved under the names:
dNBI.0tr pNBI.0tr qNBI.0tr rNBI.0tr NBI.0tr The type of truncation is left and the truncation parameter is 0
04-05-2020 15:53:44 Subsampling 1000/27969801 interactions for the null model regression... 04-05-2020 15:53:53 Fitting with a maximum of 200 iterations... Using formula: N ~ log(abs(dist)) + b.trans_res + p.trans_res + sqrt(b.length) + sqrt(p.length) Fejl i if (dv > olddv && itn >= 2 && auto == TRUE) { : manglende værdi hvor TRUE/FALSE er krævet In addition: Advarselsbesked: I pnbinom(q, size = 1/sigma, mu = mu, lower.tail = lower.tail, : NaNs produced
But then reducing the MaxDist to 5Mbp enables the run again.
BI = bin_interactions(interactions, fragments, bins=5, max_dist=10e5)
models = by(BI$interactions, BI$interactions$dist.bin, model_bin, subsample_size=2000)
Hi Adrija,
Glad to see you've made progress!
It makes sense that you occassionally succeed but also occassionally fail to model a bin when you provide the subsample_size argument: the subset of interactions the models is based on will vary with each call to model_bin().
You could consider increasing your subsample_size=... for a higher chance of success. Note that providing NA will cause all interaction data in the distance bin to be used, which is great for the purpose of building these models, but could take more time.
It would also be perfectly sensible to further decrease the 5Mbp max_dist if you notice data in distant bins cannot be modeled. I am aware that others have also used more conservative cut-offs at e.g. 1 Mbp (as in https://www.biorxiv.org/content/10.1101/813618v2).
Please do let me know if you have any additional questions.
Best wishes,
Chris
Hi Chris,
Using 1 Mb actually makes sense, since the median of Chicago interactions are found within 100-500 kb from baits, and a relatively small proportion of interactions are found beyond 1Mb.
I had an i if (dv > olddv && itn >= 2 && auto == TRUE) { :
error today and I fixed it by setting
gamlss(
...,
control = gamlss::gamlss.control(mu.step = .5)
)
which is half of the default value.
(In case anyone else with this error finds this issue.)
Hi,
I am running the Peaky vignette on top of Chicago vignette to test the Peaky tool. At the modelling step I get an error:
This error does not appear if I run the vignette on Peaky dataset. Please find more details below:
Define functions:
Load data for Chicago Vignette
Peaky analysis on Chicago Data