ChaoranHu / coga

R package for Convolution of Gamma Distributions
GNU General Public License v3.0
4 stars 1 forks source link

Inf or NaN happened, not converge! #1

Closed soumenroy closed 5 years ago

soumenroy commented 5 years ago

I am using pcoga and dcoga to compute the convolution of gamma distributions when the shape parameters length is high (>200). Both the functions show Inf or NaN happened, not converge!. However, rcoga is working correctly. The link of the shape and rate parameters is available here. The code link is available here. In my code, L:9 is setting the number of shape and rate parameters length.

Advance thanks for your help!

ChaoranHu commented 5 years ago

Hi, Thank you for your question. I try to use your code and parameter and the fault happens. I believe this is the numerical problem: when the sum of parameters is too large, our code cannot handle and will give error or warning. This is because the calculation of cdf and pdf will require factorial or gamma function, which is easy to exceed the numerical tolerance. Try to reduce the sum of parameters and the code should work.

I do apologize for any inconvenience!

Chaoran

soumenroy commented 5 years ago

Hi Chaoran,

Thanks for your comment. I appreciate for writing this coga package which is numerically highly stable.

As far as I mentioned last time the coga is running without raising any error for the small number of random variables.

The problem is resolved using a mathematical approach. Indeed, the summation of the Gamma random variables is following the exact Gaussian distribution which implies the occurrence of the central limit theorem.

Let's assume the independent Gamma random variables X1, X2, . . . , Xn are following Γ(αi, βi) , where i = 1, 2, . . . , n. Summation of these random varaibles for large value of n following a Gaussian distributrion with mean μ = ∑i=1n αi βi and standard deviation σ = ∑i=1n αi β2i.

I have added a python script here

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gamma

# Data was attached in my erlier comment
data = np.loadtxt("shape_rate.txt")
alpha_arr, beta_arr = data[:,0], data[:,1]

# Summation of Gamma random variables
npts = 100000
rv = np.zeros(npts)
for alpha, beta in zip(alpha_arr, beta_arr):
    rv += np.random.gamma(alpha, 1.0/beta, npts)   

fig, ax = plt.subplots(figsize=(5, 3))
count, bins, ignored = ax.hist(rv, 100, density=True)

# Compute Gaussian pdf
pdf = norm.pdf(bins, np.sum(alpha_arr/beta_arr), np.sqrt(np.sum(alpha_arr/beta_arr**2)))
ax.plot(bins, pdf, color='r')
plt.savefig("gamma_sum.png", dpi=100)

gamma_sum

ChaoranHu commented 5 years ago

Thank you for your comment and suggestion so much! Indeed, we can use Normal distribution to approximate convolution of "many" gamma variable. The design of dcoga and pcoga are for exact evaluation. It is an interesting topic to solve this problem!