PeakSegDisk tests ('Real ChIP-seq' vs 'Synthetic' Mono27ac data)

Anirban166 / testComplexity

Asymptotic complexity testing framework

https://anirban166.github.io/testComplexity/

Other

36 stars 2 forks source link

PeakSegDisk tests ('Real ChIP-seq' vs 'Synthetic' Mono27ac data) #14

Closed Anirban166 closed 4 years ago

Anirban166 commented 4 years ago

Initial try:

library(boot)
library(data.table)
data(Mono27ac, package="PeakSegDisk", envir=environment())
loss.list <- list()
N.data.vec <- 10^seq(1, 3) # larger sizes result in error
for(penalty in 10^seq(0, 6, by = 2)) # can change, kept it as (1, 1e2, 1e4, 1e6) 
{
  for(N.data in N.data.vec)
  {
    some.cov <- Mono27ac$coverage[1:N.data]
    some.inc <- data.table(some.cov)
    some.inc[, count := 1:.N]
    data.list <- list(
      real=some.cov,
      synthetic=some.inc)
    for(short.type in names(data.list))
    {
      df <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_df(df, penalty)
      loss.list[[paste(penalty, N.data, short.type)]] <- data.table(
        N.data,
        short.type,
        fit$loss)
    }
  }
}
(loss <- do.call(rbind, loss.list))[, .(
  penalty, short.type, N.data,
  mean.intervals, max.intervals,
  megabytes, seconds)][order(penalty, short.type, N.data)]

The columns we are concerned with or require are loss$mean.intervals and loss$N.data, for different penalty values. We can directly pass them onto asymptoticComplexityClassifier, but we need to select the specific rows which pertain to either real or synthetic data and belong to a penalty value, for which am using filter from dplyr. Here is the result for synthetic data:

The same for real gives squareroot complexity, which is less than synthetic no doubt, but then they are not what we require (quadratic and log-linear respectively) - but this is expected since we barely have any values to map our glm fitted values against, as shown in the error. (misleading prediction for small number of values)

So I tried this for a large number of mid-values in N.data.vec (tried going for more than 10^3, but it gives an error saying Error in if (any(count.df$chromStart < 0)) { : missing value where TRUE/FALSE needed) by increasing the step-size interval to 0.01, or by using N.data.vec <- 10^seq(1, 3, by = 0.01):

Results (real/synthetic) for each penalty value (0, 1e2, 1e4, 1e6):

The complexities for synthetic type data are showing linear and log-linear complexities which is incorrect whereas for real type data it is showing log-linear for all penalty values, which is correct.

Anirban166 commented 4 years ago

@tdhock

I would like to test the above on higher data sizes or for higher-end range for N.data.vec, (which is currently affixed to a max value of 10^3) because trends for smaller sizes might be mistaken.
I tried changing the value to incorporate 10,000 (N.data.vec = 10^seq(1, 4, by = 0.5)) or higher values but it throws this error:

Error in if (any(count.df$chromStart < 0)) { : missing value where TRUE/FALSE needed

How can I bypass it?

tdhock commented 4 years ago

maybe that is because the real data set does not have enough data points? you could make several copies of the real data set?

Anirban166 commented 4 years ago

maybe that is because the real data set does not have enough data points?

yes thats what I thought too

you could make several copies of the real data set?

okay I'll make replicates and see if that works

Anirban166 commented 4 years ago

Making three copies of our required coverage data-frame/table with the help of zoo's coredata function, and testing it for data sizes uptil 10⁴:

library(zoo)
library(boot)
library(dplyr)
library(data.table)
loss.list <- list()
data(Mono27ac, package="PeakSegDisk", envir=environment())
replicated.test.data <- coredata(Mono27ac$coverage)[rep(seq(nrow(Mono27ac$coverage)), 3),] 
N.data.vec <- 10^seq(1, 4)
for(penalty in 10^seq(0, 6, by = 2))
{
  for(N.data in N.data.vec)
  {
    some.cov <- replicated.test.data[1:N.data]
    some.inc <- data.table(some.cov)
    some.inc[, count := 1:.N]
    data.list <- list(
      real=some.cov,
      synthetic=some.inc)
    for(short.type in names(data.list))
    {
      df <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_df(df, penalty)
      loss.list[[paste(penalty, N.data, short.type)]] <- data.table(
        N.data,
        short.type,
        fit$loss)
    }
  }
}
(loss <- do.call(rbind, loss.list))[, .(
  penalty, short.type, N.data,
  mean.intervals, max.intervals,
  megabytes, seconds)][order(penalty, short.type, N.data)]

Throws an error signifying there should be no gaps and stops computation after the first penalty value (6 rows):

Error in PeakSegFPOP_file(prob.cov.bedGraph, penalty.str, db.file) : 
  there should be no gaps (columns 2-3) in input data file C:\Users\HP\AppData\Local\Temp\RtmpKqpfed\chr11-60000-580000\coverage.bedGraph
Timing stopped at: 0.01 0 0.01

@tdhock what does this error mean and is there anyway I could rectify it?
Peaksegdiskreplicates

Again if we were to test it for lower size limit for N.data.vec, it works for data size uptil 10³, which brings back the case to square one (gives the same old result, and throws error at larger dataset sizes) Peaksegdiskreplicates2

tdhock commented 4 years ago

that error means that your chromStart/chromEnd values don't make sense (start should always be less than end in the same row, and equal to the end in the previous row).

why are you using zoo? can't you just use rbind / basic R operations/functions?

you can use PeakSegDisk::PeakSegFPOP_vec with rep: fit <- PeakSegDisk::PeakSegFPOP_vec(rep(Mono27ac$coverage$count, 3), 10.5)

On Fri, Jul 3, 2020 at 9:04 PM Anirban notifications@github.com wrote:

Making three copies of our required coverage data-frame/table with the help of zoo's coredata function, and testing it for data sizes uptil 104:

library(zoo) library(boot) library(dplyr) library(data.table)loss.list <- list() data(Mono27ac, package="PeakSegDisk", envir=environment())replicated.test.data <- coredata(Mono27ac$coverage)[rep(seq(nrow(Mono27ac$coverage)), 3),] N.data.vec <- 10^seq(1, 4)for(penalty in 10^seq(0, 6, by = 2)) { for(N.data in N.data.vec) { some.cov <- replicated.test.data[1:N.data] some.inc <- data.table(some.cov) some.inc[, count := 1:.N] data.list <- list( real=some.cov, synthetic=some.inc) for(short.type in names(data.list)) { df <- data.list[[short.type]] fit <- PeakSegDisk::PeakSegFPOP_df(df, penalty) loss.list[[paste(penalty, N.data, short.type)]] <- data.table( N.data, short.type, fit$loss) } } } (loss <- do.call(rbind, loss.list))[, .( penalty, short.type, N.data, mean.intervals, max.intervals, megabytes, seconds)][order(penalty, short.type, N.data)]

Throws an error signifying there should be no gaps and stops computation after the first penalty value (6 rows):

Error in PeakSegFPOP_file(prob.cov.bedGraph, penalty.str, db.file) : there should be no gaps (columns 2-3) in input data file C:\Users\HP\AppData\Local\Temp\RtmpKqpfed\chr11-60000-580000\coverage.bedGraphTiming stopped at: 0.01 0 0.01

@tdhock https://github.com/tdhock what does this error mean and is there anyway I could rectify it? [image: Peaksegdiskreplicates] https://user-images.githubusercontent.com/30123691/86504524-0d7aee80-bdd7-11ea-925f-e08e42c9e3a9.png

Again if we were to test it for lower size limit for N.data.vec, it works for data size uptil 103, which brings back the case to square one (gives the same old result, and throws error at larger dataset sizes) [image: Peaksegdiskreplicates2] https://user-images.githubusercontent.com/30123691/86504561-4fa43000-bdd7-11ea-8ddd-7d8a83e13b11.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Anirban166/testComplexity/issues/14#issuecomment-653717716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDX4V32A6L37BCUZI7KM3RZ2S2RANCNFSM4N7ZWO4Q .

Anirban166 commented 4 years ago

that error means that your chromStart/chromEnd values don't make sense (start should always be less than end in the same row, and equal to the end in the previous row).

ah okay

why are you using zoo? can't you just use rbind / basic R operations/functions? you can use PeakSegDisk::PeakSegFPOP_vec with rep: fit <- PeakSegDisk::PeakSegFPOP_vec(rep(Mono27ac$coverage$count, 3), 10.5)

I did try with rbind() and other normal functions such as sapply before:

replicated.test.data <- do.call("rbind", replicate(3, test.data, simplify = FALSE))
replicated.test.data <- as.data.frame(sapply(test.data, rep.int, times = 3))

It results in the same error: rbindtest

you can use PeakSegDisk::PeakSegFPOP_vec with rep: fit <- PeakSegDisk::PeakSegFPOP_vec(rep(Mono27ac$coverage$count, 3), 10.5)

Thanks! I'll try that

Anirban166 commented 4 years ago

Now there's a fixed value for mean.intervals, max.intervals and megabytes (I understand there are replicates, but atleast some of the values should change)

when I test it with non-replicated dataset (only changing fit):
and also for the case with replicated Mono27ac dataset (+ changing fit, also when I tried changing the penalty or assigning penalty instead of 10.5, it seems it still always runs the computation at 10.5 as it shows only 10.5 for values in penalty column): With these fixed values for mean.intervals, the expected complexity would be constant but nevertheless I still tested it just to verify: Not sure why those values are being fixed

tdhock commented 4 years ago

You should be able to figure this out Anirban. looking at your code, df depends on N.data but you are not using it. Instead of using rep inside the for loops, use it outside to create a bigger version of Mono27ac, which you can then subset to create some.cov.

Anirban166 commented 4 years ago

well I created a copy of Mono27ac data and tried to create a bigger version of Mono27ac$coverage (x3) using an rbind and then I sorted the columns according to chromStart to avoid the previous error regarding inconsistency between chromStart and chromEnd (cols 2,3) as you mentioned before, and I assigned that bigger/replicated and ordered data back into coverage for my copy of Mono27ac: PSDT The first error says undefined columns are selected but then it is defined, and the values for test.data$coverage are:

@tdhock Sorry to bother you about this again, but could you please check what I might be doing wrong here?

tdhock commented 4 years ago

This is pretty simple and I'm a bit frustrated that you did not figure it out yourself. if you first define the sizes you are going to use in N.data.vec, then you can create a replicated real data count vector which is as big as you need for the max size:

data(Mono27ac, package="PeakSegDisk", envir=environment())
N.data.vec <- 10^seq(1, 4)
count.vec <- Mono27ac$coverage$count
Mono27ac.bigger <- rep(count.vec, ceiling(max(N.data.vec)/length(count.vec)))

after that you can define count data vectors based on that replicated data set

    data.list <- list(
      real=Mono27ac.bigger[1:N.data],
      synthetic=1:N.data)

and then use those data in a call to PeakSegFPOP_vec

    for(short.type in names(data.list)){
      some.count.vec <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_vec(some.count.vec, penalty)

overall the code block is

library(data.table)
data(Mono27ac, package="PeakSegDisk", envir=environment())
N.data.vec <- 10^seq(1, 4)
count.vec <- Mono27ac$coverage$count
Mono27ac.bigger <- rep(count.vec, ceiling(max(N.data.vec)/length(count.vec)))
loss.list <- list()
for(penalty in c(0, 1e2, 1e4, 1e6)){
  for(N.data in N.data.vec){
    data.list <- list(
      real=Mono27ac.bigger[1:N.data],
      synthetic=1:N.data)
    for(short.type in names(data.list)){
      some.count.vec <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_vec(some.count.vec, penalty)
      loss.list[[paste(penalty, N.data, short.type)]] <- data.table(
        N.data,
        short.type,
        fit$loss)
    }
  }
}
(loss <- do.call(rbind, loss.list))[, .(
  penalty, short.type, N.data, 
  mean.intervals, max.intervals,
  megabytes, seconds)][order(penalty, short.type, N.data)]

and on my system I got the following

    penalty short.type N.data mean.intervals max.intervals    megabytes seconds
 1:       0       real     10         0.9500             1 8.850098e-04   0.001
 2:       0       real    100         2.0800             4 1.326370e-02   0.006
 3:       0       real   1000         2.8390             5 1.616936e-01   0.050
 4:       0       real  10000         2.7149             5 1.569698e+00   0.519
 5:       0  synthetic     10         2.3500             4 1.419067e-03   0.001
 6:       0  synthetic    100         2.7100             4 1.566696e-02   0.005
 7:       0  synthetic   1000         2.7460             4 1.581459e-01   0.052
 8:       0  synthetic  10000         2.8675             5 1.627911e+00   0.578
 9:     100       real     10         1.9500             3 1.266479e-03   0.001
10:     100       real    100         3.5800             7 1.898575e-02   0.006
11:     100       real   1000        11.4410            30 4.898338e-01   0.076
12:     100       real  10000        11.5558            31 4.942234e+00   0.819
13:     100  synthetic     10         2.4500             6 1.457214e-03   0.001
14:     100  synthetic    100        25.3950            46 1.022034e-01   0.010
15:     100  synthetic   1000        89.7805           151 3.478249e+00   0.263
16:     100  synthetic  10000       211.6770           341 8.128239e+01   5.618
17:   10000       real     10         1.2000             2 9.803772e-04   0.001
18:   10000       real    100         3.0700             7 1.704025e-02   0.005
19:   10000       real   1000         7.7640            26 3.495674e-01   0.063
20:   10000       real  10000        22.0908            44 8.961018e+00   1.203
21:   10000  synthetic     10         2.4500             6 1.457214e-03   0.003
22:   10000  synthetic    100        20.9000            58 8.505630e-02   0.011
23:   10000  synthetic   1000       232.3920           432 8.918446e+00   0.622
24:   10000  synthetic  10000       868.8909          1477 3.319896e+02  26.227
25: 1000000       real     10         1.2000             2 9.803772e-04   0.001
26: 1000000       real    100         2.1450             5 1.351166e-02   0.004
27: 1000000       real   1000         8.3640            19 3.724556e-01   0.066
28: 1000000       real  10000         9.8935            29 4.308117e+00   0.733
29: 1000000  synthetic     10         2.4500             6 1.457214e-03   0.001
30: 1000000  synthetic    100        20.7350            57 8.442688e-02   0.011
31: 1000000  synthetic   1000       203.3360           570 7.810047e+00   0.915
32: 1000000  synthetic  10000      2299.7677          4290 8.778258e+02  92.979
    penalty short.type N.data mean.intervals max.intervals    megabytes seconds

Anirban166 commented 4 years ago

data(Mono27ac, package="PeakSegDisk", envir=environment())
N.data.vec <- 10^seq(1, 4)
count.vec <- Mono27ac$coverage$count
Mono27ac.bigger <- rep(count.vec, ceiling(max(N.data.vec)/length(count.vec)))

after that you can define count data vectors based on that replicated data set

    data.list <- list(
      real=Mono27ac.bigger[1:N.data],
      synthetic=1:N.data)

and then use those data in a call to PeakSegFPOP_vec

    for(short.type in names(data.list)){
      some.count.vec <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_vec(some.count.vec, penalty)

overall the code block is

library(data.table)
data(Mono27ac, package="PeakSegDisk", envir=environment())
N.data.vec <- 10^seq(1, 4)
count.vec <- Mono27ac$coverage$count
Mono27ac.bigger <- rep(count.vec, ceiling(max(N.data.vec)/length(count.vec)))
loss.list <- list()
for(penalty in c(0, 1e2, 1e4, 1e6)){
  for(N.data in N.data.vec){
    data.list <- list(
      real=Mono27ac.bigger[1:N.data],
      synthetic=1:N.data)
    for(short.type in names(data.list)){
      some.count.vec <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_vec(some.count.vec, penalty)
      loss.list[[paste(penalty, N.data, short.type)]] <- data.table(
        N.data,
        short.type,
        fit$loss)
    }
  }
}
(loss <- do.call(rbind, loss.list))[, .(
  penalty, short.type, N.data, 
  mean.intervals, max.intervals,
  megabytes, seconds)][order(penalty, short.type, N.data)]

and on my system I got the following

    penalty short.type N.data mean.intervals max.intervals    megabytes seconds
 1:       0       real     10         0.9500             1 8.850098e-04   0.001
 2:       0       real    100         2.0800             4 1.326370e-02   0.006
 3:       0       real   1000         2.8390             5 1.616936e-01   0.050
 4:       0       real  10000         2.7149             5 1.569698e+00   0.519
 5:       0  synthetic     10         2.3500             4 1.419067e-03   0.001
 6:       0  synthetic    100         2.7100             4 1.566696e-02   0.005
 7:       0  synthetic   1000         2.7460             4 1.581459e-01   0.052
 8:       0  synthetic  10000         2.8675             5 1.627911e+00   0.578
 9:     100       real     10         1.9500             3 1.266479e-03   0.001
10:     100       real    100         3.5800             7 1.898575e-02   0.006
11:     100       real   1000        11.4410            30 4.898338e-01   0.076
12:     100       real  10000        11.5558            31 4.942234e+00   0.819
13:     100  synthetic     10         2.4500             6 1.457214e-03   0.001
14:     100  synthetic    100        25.3950            46 1.022034e-01   0.010
15:     100  synthetic   1000        89.7805           151 3.478249e+00   0.263
16:     100  synthetic  10000       211.6770           341 8.128239e+01   5.618
17:   10000       real     10         1.2000             2 9.803772e-04   0.001
18:   10000       real    100         3.0700             7 1.704025e-02   0.005
19:   10000       real   1000         7.7640            26 3.495674e-01   0.063
20:   10000       real  10000        22.0908            44 8.961018e+00   1.203
21:   10000  synthetic     10         2.4500             6 1.457214e-03   0.003
22:   10000  synthetic    100        20.9000            58 8.505630e-02   0.011
23:   10000  synthetic   1000       232.3920           432 8.918446e+00   0.622
24:   10000  synthetic  10000       868.8909          1477 3.319896e+02  26.227
25: 1000000       real     10         1.2000             2 9.803772e-04   0.001
26: 1000000       real    100         2.1450             5 1.351166e-02   0.004
27: 1000000       real   1000         8.3640            19 3.724556e-01   0.066
28: 1000000       real  10000         9.8935            29 4.308117e+00   0.733
29: 1000000  synthetic     10         2.4500             6 1.457214e-03   0.001
30: 1000000  synthetic    100        20.7350            57 8.442688e-02   0.011
31: 1000000  synthetic   1000       203.3360           570 7.810047e+00   0.915
32: 1000000  synthetic  10000      2299.7677          4290 8.778258e+02  92.979
    penalty short.type N.data mean.intervals max.intervals    megabytes seconds

Am really sorry about that, I should have digged a little deeper..I understand your approach completely but when I tried, for some reason the penalty value for me is being fixed: (happened with my recent earlier approaches as well, when am using PeakSegDisk::PeakSegFPOP_vec)

PSDT

But don't worry I'll figure it out (must be some local issue I guess) and Thank you so much for helping me out again!

Anirban166 commented 4 years ago

library(data.table)
#> Warning: package 'data.table' was built under R version 3.6.3
data(Mono27ac, package="PeakSegDisk", envir=environment())
N.data.vec <- 10^seq(1, 4)
count.vec <- Mono27ac$coverage$count
Mono27ac.bigger <- rep(count.vec, ceiling(max(N.data.vec)/length(count.vec)))
loss.list <- list()
for(penalty in c(0, 1e2, 1e4, 1e6)){
  for(N.data in N.data.vec){
    data.list <- list(
      real=Mono27ac.bigger[1:N.data],
      synthetic=1:N.data)
    for(short.type in names(data.list)){
      some.count.vec <- data.list[[short.type]]
      fit <- PeakSegDisk::PeakSegFPOP_vec(some.count.vec, penalty)
      loss.list[[paste(penalty, N.data, short.type)]] <- data.table(
        N.data,
        short.type,
        fit$loss)
    }
  }
}
(loss <- do.call(rbind, loss.list))[, .(
  penalty, short.type, N.data, 
  mean.intervals, max.intervals,
  megabytes, seconds)][order(penalty, short.type, N.data)]
#>     penalty short.type N.data mean.intervals max.intervals    megabytes seconds
#>  1:    10.5       real     10        2.00000             4  0.001285553    0.01
#>  2:    10.5       real     10        2.00000             4  0.001285553    0.00
#>  3:    10.5       real     10        2.00000             4  0.001285553    0.00
#>  4:    10.5       real     10        2.00000             4  0.001285553    0.05
#>  5:    10.5       real    100        4.38500             8  0.022056580    0.02
#>  6:    10.5       real    100        4.38500             8  0.022056580    0.00
#>  7:    10.5       real    100        4.38500             8  0.022056580    0.02
#>  8:    10.5       real    100        4.38500             8  0.022056580    0.00
#>  9:    10.5       real   1000        9.50700            22  0.416057587    0.09
#> 10:    10.5       real   1000        9.50700            22  0.416057587    0.08
#> 11:    10.5       real   1000        9.50700            22  0.416057587    0.08
#> 12:    10.5       real   1000        9.50700            22  0.416057587    0.08
#> 13:    10.5       real  10000        8.58725            22  3.809822083    0.64
#> 14:    10.5       real  10000        8.58725            22  3.809822083    0.67
#> 15:    10.5       real  10000        8.58725            22  3.809822083    0.69
#> 16:    10.5       real  10000        8.58725            22  3.809822083    0.64
#> 17:    10.5  synthetic     10        3.50000             7  0.001857758    0.00
#> 18:    10.5  synthetic     10        3.50000             7  0.001857758    0.03
#> 19:    10.5  synthetic     10        3.50000             7  0.001857758    0.00
#> 20:    10.5  synthetic     10        3.50000             7  0.001857758    0.00
#> 21:    10.5  synthetic    100       18.82000            33  0.077121735    0.01
#> 22:    10.5  synthetic    100       18.82000            33  0.077121735    0.00
#> 23:    10.5  synthetic    100       18.82000            33  0.077121735    0.00
#> 24:    10.5  synthetic    100       18.82000            33  0.077121735    0.00
#> 25:    10.5  synthetic   1000       47.33250            76  1.858985901    0.16
#> 26:    10.5  synthetic   1000       47.33250            76  1.858985901    0.16
#> 27:    10.5  synthetic   1000       47.33250            76  1.858985901    0.14
#> 28:    10.5  synthetic   1000       47.33250            76  1.858985901    0.16
#> 29:    10.5  synthetic  10000      102.77935           164 39.741256714    2.22
#> 30:    10.5  synthetic  10000      102.77935           164 39.741256714    2.25
#> 31:    10.5  synthetic  10000      102.77935           164 39.741256714    2.25
#> 32:    10.5  synthetic  10000      102.77935           164 39.741256714    2.17
#>     penalty short.type N.data mean.intervals max.intervals    megabytes seconds

^{Created on 2020-08-13 by the reprex package (v0.3.0)}

Session Info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8 reprex_0.3.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6          compiler_3.6.1        prettyunits_1.0.2     remotes_2.1.1         tools_3.6.1          
 [6] testthat_2.3.2        digest_0.6.25         packrat_0.5.0         pkgbuild_1.0.8        pkgload_1.0.2        
[11] memoise_1.1.0         evaluate_0.14         rlang_0.4.6           cli_2.0.2             rstudioapi_0.11      
[16] xfun_0.14             withr_2.1.2           knitr_1.28            desc_1.2.0            fs_1.3.1             
[21] devtools_2.3.0        rprojroot_1.3-2       glue_1.4.1            R6_2.4.0              processx_3.4.1       
[26] fansi_0.4.0           PeakSegDisk_2019.9.27 rmarkdown_2.3         sessioninfo_1.1.1     callr_3.4.3          
[31] clipr_0.7.0           magrittr_1.5          whisker_0.4           backports_1.1.4       ps_1.3.0             
[36] ellipsis_0.3.0        htmltools_0.3.6       usethis_1.6.1         assertthat_0.2.1      crayon_1.3.4

tdhock commented 4 years ago

hi please upgrade to PeakSegDisk from Github https://github.com/tdhock/PeakSegDisk/blob/master/NEWS#L61

Anirban166 commented 4 years ago

hi please upgrade to PeakSegDisk from Github https://github.com/tdhock/PeakSegDisk/blob/master/NEWS#L61

That clears it, Thanks!

I tested it on a few sets of different data sizes and penalties, but for all of them where I expected the estimated complexities, it shows log-linear, which is the expected one for real case, but unexpected for synthetic (expected quadratic, esp. for higher penalty values):

with data sizes of 10^seq(1, 4, by = 0.01) and penalties of (0, 1e2, 1e4, 1e6):
(testing on a large set of values, with a low step size of data sizes)

with data sizes of 10^seq(1, 5, by = 0.1) and penalties of (0, 1e6):
(considering extreme penalty values only, since it takes a lot of time for each, and with data size limit upto 10^5)

I didn't test for further size limit like 10^6 since the previous tests itself took more than an hour in total, but I think it might give the same result too, since we are working with replicated data