archer-yang-lab / gglasso

Automatically exported from code.google.com/p/gglasso
GNU General Public License v2.0
10 stars 5 forks source link

Fail to use group option #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

data(bardet)
gglasso(x=bardet$x[,1:10],y=bardet$y,group=rep(1:5,2),loss="ls", lambda = 
0.004)$beta

What is the expected output? What do you see instead?

Below is my output. The group option indication is misleading. 

> gglasso(x=bardet$x[,1:10],y=bardet$y,group=rep(1:5,2),loss="ls", lambda = 
0.004)$beta
            s0
V1   0.0000000
V2   0.0000000
V3   0.0000000
V4   0.0000000
V5  -0.3592322
V6   0.1171182
V7   0.0000000
V8   0.0000000
V9   0.0000000
V10  0.0000000
> rep(1:5,2)
 [1] 1 2 3 4 5 1 2 3 4 5

What version of the product are you using? On what operating system?

R 3.0.2 on windows

Please provide any additional information below.

Original issue reported on code.google.com by elong0...@gmail.com on 9 Oct 2013 at 9:59

AndrewLawrence commented 4 years ago

The documentation says that the group argument must contain:

a vector of consecutive integers describing the grouping of the coefficients

So the example above fails (and still fails in the latest version) because rep(1:5, 2) is not sorted ascending and so contains non-consecutive integers.

In practice this means that for gglasso the columns of x must be strictly sorted by group or results are nonsensical because the group is not applied as the analyst intends. The documentation is not very explicit on this requirement and there is no working check for this, only the following:

if (!identical(as.integer(sort(unique(group))), as.integer(1:bn))) 
        stop("Groups must be consecutively numbered 1,2,3,...")

This checks that there are no skipped levels e.g. c(1,1,1,2,2,2,4,4,4), but because unique(group) is sorted the problems with rep(1:5, 2) will not trigger this error.

A better check would be something with run length encoding, compare:

rle(rep(1:5, 2))
# Run Length Encoding
#   lengths: int [1:10] 1 1 1 1 1 1 1 1 1 1
#   values : int [1:10] 1 2 3 4 5 1 2 3 4 5 

rle(rep(1:5,each = 2))
# Run Length Encoding
#   lengths: int [1:5] 2 2 2 2 2
#   values : int [1:5] 1 2 3 4 5

So a more sensitive test could be:

identical(rle(group)$values, 1:as.integer(max(group)))

Note: above testing and code excerpts are from CRAN version 1.4 of gglasso using R 3.6.1 on Windows