abess-team / abess

Fast Best-Subset Selection Library
https://abess.readthedocs.io/
Other
474 stars 41 forks source link

Why the data generator funciton `make_glm_data` for gamma will define n shape parameters for a data set #509

Closed belzheng closed 1 year ago

belzheng commented 1 year ago

Describe the bug Why the data generator funciton make_glm_data for gamma will define n shape parameters for a data set

elif family == "gamma":
            x = x / 16
            m = 5 * np.sqrt(2 * np.log(p) / n)
            if coef_ is None:
                Tbeta[nonzero] = np.random.uniform(m, 100 * m, k) * sign
            else:
                Tbeta = coef_
            # add noise
            eta = x @ Tbeta + np.random.normal(0, sigma, n)
            # set coef_0 to make eta<0
            eta = eta - np.abs(np.max(eta)) - 10
            eta = -1 / eta
            # set the shape para of gamma uniformly in [0.1,100.1]
            shape_para = 100 * np.random.uniform(0, 1, n) + 0.1
            y = np.random.gamma(
                shape=shape_para,
                scale=eta / shape_para,
                size=n)

Additional context

Would it be more sensible that a data set share the same shape parameter?

Mamba413 commented 1 year ago

Thanks for this question. I read "Chapter 8.2 The gamma distribution" in the book "Generzlized Linear Models", and find the authors say "we are concerned mostly with models for which the shape parameter is assumed constant for all observations". So I think it is make sense that a data set share the same shape parameter.

Actually, the simulation setting adopts that in R:

# add noise
sigma <- sqrt((t(beta) %*% Sigma %*% beta) / snr)
eta <- x %*% beta + stats::rnorm(n, 0, sigma)
# set coef_0 as + abs(min(eta)) + 1
eta <- eta + abs(min(eta)) + 10
# set the shape parameter of gamma uniformly in [0.1, 100.1]
shape_para <- 100 * runif(n) + 0.1
y <- stats::rgamma(n, shape = shape_para, rate = shape_para * eta)

This is written by @bbayukari , is there has any special consideration?

bbayukari commented 1 year ago

I completely agree with you that a data set should share the same shape parameter. It has been fixed in d0394626944649d464493e1c5cb01dcf0db7ccee.