Statistical Test Improvements

nandevers commented 2 days ago

The current code for statistical tests involves the following conditionals?

        if total_count > 0:
            if self.method=='chi2':
                try:
                    tstats, Praw = chisquare(counts_emp, f_exp=counts_exp)
                except:
                    raise Exception('The relative tolerance of the chisquare test is not reached. Try using another method such as "method=ks". This is not a bug but a feature: "https://github.com/scipy/scipy/issues/13362" ')
            elif self.method=='ks':
                tstats, Praw = ks_2samp(counts_emp, counts_exp)
            else:
                stats1, Praw1 = chisquare(counts_emp, f_exp=counts_exp)
                tstats2, Praw2 = ks_2samp(counts_emp, counts_exp)
                tstats, Praw = combine_pvalues([Praw1, Praw2], method='fisher')
                self.method = 'P_ensemble'

I have two proposals for this part of the code:

Replace the conditionals for dictionary style, making the code more concise and less clutter of condition handling.
Expand on the available tests by incorporating the suggestions from https://doi.org/10.3390/stats4020027 with appropriate disclamier for those who still want to use chi-squared tests.

nandevers commented 2 days ago

@erdogant , I would love to contribute with these improvements, please let me know if this sounds reasonable so I can arrange the code and drop you a note for a potential PR.

erdogant commented 2 days ago

Yes sure! A bit of refactoring sounds great 👍

erdogant / benfordslaw

Statistical Test Improvements #14