ericsuh / dirichlet

Dirichlet MLE python library
MIT License
112 stars 25 forks source link

How to handle data that contains zeros #7

Open uraniborg opened 9 months ago

uraniborg commented 9 months ago

I've used an application of this Dirichlet package by user xuod for the analysis of Morris watermaze data, which is a type of constant sum probability data. Briefly, the data comes from mice swimming in a circular waterfilled pool having four quadrants superimposed on it. The time in each quadrant is recorded over 1 minute, and the dirichlet test is used to determine if the time spent is uniform between all quadrants, or if it non-uniform, suggesting a bias for a particular quadrant. The mice are trained to locate an escape platform hidden in the "target quadrant" which is a test of their memory. It sometimes happens that the data may contain values that are zero when a mouse spends no time at all in one of the quadrants, and the dirichlet.test_uniform() python function fails to converge the model when zeros are present in the data.

Is there any work-around or good solution to address this issue? Can the dataset be transformed in some way to eliminate the zeros without fundamentally altering the data? It seems that adding a constant value to every data point, which gets around the problem of taking the log of zero, would change the data inappropriately.

Thanks for any feedback.

taguhiM commented 6 months ago

Hi @uraniborg . I am having a similar problem, I am trying to use the mle method to estimate the Dirichlet parameters, but whenever I have samples (probability vectors, summing to 1) with a component of 0, I get the NotConvergingError. I have tried to tweak the values with a small epsilon but the problem persists.

I am wondering if you have found a solution so far.