T Distribution Weirdness

angelgeek commented 1 year ago

We are using distfit to try to determine if some data we have can be modelled parametrically. For some of the data, the best fitting distribution was a t. Scale and loc are clearly documented, and that is great. There is one remaining parameter to fit a t distribution, and that is degrees of freedom. Except, the one parameter in the distfit output that isn't a scale or loc value is less than one. Obviously, degrees of freedom can't be less than one. So what is that parameter and why isn't degrees of freedom included in the output? It would be helpful for automating our process.

erdogant commented 1 year ago

Maybe I missed this one but I store all the parameters returned after the distribution fitting.

As an example:

from distfit import distfit
X = np.random.normal(0, 2, 1000)
y = [-8, -6, 0, 1, 2, 3, 4, 5, 6]
dist = distfit(stats='ks', distr=['expon', 't', 'gamma', 'lognorm'])
results = dist.fit_transform(X)

print(dist.model)
{'distr': <scipy.stats._continuous_distns.t_gen at 0x2d4882810f0>,
 'stats': 'ks',
 'params': (3518324.248643998, -0.08180702912809554, 2.0838347069246876),
 'name': 't',
 'model': <scipy.stats._distn_infrastructure.rv_continuous_frozen at 0x2d49debda80>,
 'score': 0.40237077133797083,
 'loc': -0.08180702912809554,
 'scale': 2.0838347069246876,
 'arg': (3518324.248643998,),
 'CII_min_alpha': -3.5094110072794593,
 'CII_max_alpha': 3.345796949023267}

When I now do the fit manually for only the t-distribution, the following parameters are returned:

import scipy.stats as st
# fit dist to data
params = st.t.fit(X)
print(params)
(3518324.248643998, -0.08180702912809554, 2.0838347069246876)

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

If I now compare the returned parameters and the stored ones in distfit, it is exactly the same:

params==dist.model['params']
True

erdogant commented 1 year ago

I am closing this issue. Reopen if required.

erdogant / distfit

T Distribution Weirdness #23