JuliaStats / GLM.jl

Generalized linear models in Julia
Other
584 stars 114 forks source link

Feature request: Support non-integer dependent variable for Poisson regression #494

Closed yangkeunyun closed 1 year ago

yangkeunyun commented 1 year ago

Hi,

I know that the Poisson distribution is technically defined for integers. But in many cases, there are times that it could be a good approximation by treating non-integer variable as Poisson. Other statistical programs (e.g. R, Stata) seem to take it. I hope it is possible to do a similar approach in Julia. Below is the MWE (including a comparison with R code) and stacktrace:

using DataFrames, GLM, Random, Distributions
using RCall

# Generate random dataset with Float64 type dependent variable
df = DataFrame(y  = rand(Uniform(0,10), 10),
               x1 = randn(10),
               x2 = randn(10))

# Poisson Regression in R
println("This is Poisson regression in R")
R"print(summary(glm(y ~ x1 + x2, family=poisson, data=$df)))"

# Poisson Regression in Julia
println("\n This is Poisson regression in Julia")
fit(GeneralizedLinearModel, @formula(y ~ x1 + x2), df, Poisson())     
This is Poisson regression in R

Call:
glm(formula = y ~ x1 + x2, family = poisson, data = `#JL`$df)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-2.5851  -1.3313   0.2815   1.0060   1.5319

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   1.8127     0.1315  13.782   <2e-16 ***
x1            0.1248     0.1370   0.911    0.363
x2           -0.1083     0.1269  -0.853    0.394
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 22.002  on 9  degrees of freedom
Residual deviance: 20.690  on 7  degrees of freedom
AIC: Inf

Number of Fisher Scoring iterations: 5

┌ Warning: RCall.jl: Warning in dpois(y, mu, log = TRUE) : non-integer x = 9.620932
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 7.250401
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 1.245370
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 8.879959
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 4.531989
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 9.078459
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 9.979947
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 1.019928
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 7.847729
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 2.645772
└ @ RCall C:\Users\yangk\.julia\packages\RCall\6kphM\src\io.jl:172

 This is Poisson regression in Julia
ERROR: ArgumentError: y must be in the support of D
Stacktrace:
 [1] checky
   @ C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:642 [inlined]
 [2] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, η::Vector{Float64}, μ::Vector{Float64}, off::Vector{Float64}, wts::Vector{Float64})
   @ GLM C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:36
 [3] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, off::Vector{Float64}, wts::Vector{Float64})
   @ GLM C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:61
 [4] fit(::Type{GeneralizedLinearModel}, X::Matrix{Float64}, y::Vector{Float64}, d::Poisson{Float64}, l::LogLink; dofit::Bool, wts::Vector{Float64}, offset::Vector{Float64}, fitargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ GLM C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:488
 [5] fit (repeats 2 times)
   @ C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:484 [inlined]
 [6] fit(::Type{GeneralizedLinearModel}, f::FormulaTerm{Term, Tuple{Term, Term}}, data::DataFrame, args::Poisson{Float64}; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StatsModels C:\Users\yangk\.julia\packages\StatsModels\qY2YS\src\statsmodel.jl:88
 [7] fit(::Type{GeneralizedLinearModel}, f::FormulaTerm{Term, Tuple{Term, Term}}, data::DataFrame, args::Poisson{Float64})
   @ StatsModels C:\Users\yangk\.julia\packages\StatsModels\qY2YS\src\statsmodel.jl:82
 [8] top-level scope
   @ c:\julialang\Question_poisson_glm.txt:15