Qub3k / subjective-exp-consistency-check

Software implementation of the subjective experiment consistency checking framework (based on a p-value P–P plot).
MIT License
2 stars 0 forks source link

RuntimeWarning: divide by zero encountered in log #7

Closed FranzHahn closed 3 years ago

FranzHahn commented 3 years ago

Hi guys, thanks for the code base. I've been trying to run your code on the subjective annotations of KoNViD-1k as a test run to apply it to all of our Konstanz Multimedia Signal Processing Group datasets.

So far I evaluated it on a subset of 40 items of our data. However, whenever I run the code, I get RuntimeWarnings on the first stimulus:

(jupyter_env) root@franz-goetz-hahn-jupyter-pod:/mnt/home/belial/research/Konvid150k/subjective-exp-consistency-check# python3 friendly_gsd.py k1k_exp_short_scores_pp_plot_ready.csv 
2021-03-15 09:32:41,448 friendly_gsd INFO: Reading chunk with id 0 (of 1 total chunks)
2021-03-15 09:32:41,711 friendly_gsd INFO: There are 40 stimuli to process
2021-03-15 09:32:41,712 friendly_gsd INFO: Processing stimulus number 1
2021-03-15 09:32:41,713 probability_grid_estimation INFO: Calculating the log-probability grid for the GSD model
/opt/conda/envs/jupyter_env/lib/python3.7/site-packages/pandas/core/frame.py:7603: RuntimeWarning: divide by zero encountered in log
  return lib.map_infer(x.astype(object)._values, func)
/mnt/home/belial/research/Konvid150k/subjective-exp-consistency-check/bootstrap.py:48: RuntimeWarning: divide by zero encountered in log
  T = n * np.log(n / (n_total * p))
/mnt/home/belial/research/Konvid150k/subjective-exp-consistency-check/bootstrap.py:48: RuntimeWarning: invalid value encountered in multiply
  T = n * np.log(n / (n_total * p))
/mnt/home/belial/research/Konvid150k/subjective-exp-consistency-check/bootstrap.py:48: RuntimeWarning: invalid value encountered in true_divide
  T = n * np.log(n / (n_total * p))
2021-03-15 09:35:03,327 friendly_gsd INFO: Processing stimulus number 2
2021-03-15 09:37:23,409 friendly_gsd INFO: Processing stimulus number 3
2021-03-15 09:39:37,192 friendly_gsd INFO: Processing stimulus number 4
[...]
2021-03-15 11:05:55,506 friendly_gsd INFO: Processing stimulus number 40
2021-03-15 11:09:14,991 friendly_gsd INFO: Storing the results of G-test of goodness-of-fit in the G_test_on_k1k_exp_short_scores_pp_plot_ready_chunk_id_0_of_1_chunks.csv file
2021-03-15 11:09:15,305 friendly_gsd INFO: Everything done!

The results seem to indicate that everything works fine (see screenshot of the resulting csv file below), I just figured I'd drop a heads-up and hear what might be the cause, of if there is cause for concern regarding the validity of the results.

image

FranzHahn commented 3 years ago

After running the code on our data a little more it seems that my initial feeling of the code working properly seems to be wrong. The p-values computed by the code seem to be very low, when comparing it to the distribution of votes on the ACR scale.

Here are some examples of very low p-values computed by the method which according to the distributions I would consider too low:

image

I'd be really happy if we could figure out what's wrong with my setup when running your code.

FranzHahn commented 3 years ago

I went ahead and tried to validate my feelings, that similar vote distributions sometimes get wildly different p-values.

Here are just a few exemplary pairs of stimuli that show high levels of correlation (so positive or negative correlation) between the vote distributions, but wildly different p-values:

image

image

image

These differences in the p-values seem counterintuitive.

FranzHahn commented 3 years ago

I have found two cases where this division by zero error occurs. I have adjusted it locally, and will quickly explain what happens and how to fix it:

In the estimate_parameters function in probability_grid.py the logarithm is applied to the prob_grid_gsd_df. This dataframe, however, contains zero values. So in both lines 166 and 172, the lambda function should be altered to check for this:

def estimate_parameters(sample, prob_grid_df, sample_as_counts=False):
[...]
    if is_gsd:
        if log_prob_grid_gsd_df is None:
            logger.info("Calculating the log-probability grid for the GSD model")
            log_prob_grid_gsd_df = prob_grid_df.applymap(lambda x: np.log(x) if (x!=0) else np.log(np.finfo(np.float32).eps))
        log_prob_grid_df = log_prob_grid_gsd_df

    if is_normal:
        if log_prob_grid_normal_df is None:
            logger.info("Calculating the log-probability grid for the QNormal model")
            log_prob_grid_normal_df = prob_grid_df.applymap(lambda x: np.log(x) if (x!=0) else np.log(np.finfo(np.float32).eps))
        log_prob_grid_df = log_prob_grid_normal_df

[...]
    return estimated_psi_sigma_tuple

In the T_statistic function in bootstrap.py, both n and p can have zero entries. This causes the call of T = n * np.log(n / (n_total * p)) to be a division by zero. Since both n and p can be one-dimensional or two-dimensional inputs, here's a dirty fix for it:

def T_statistic(n, p):
    """
    Calculates the T statistic (as defined by BĆ for the G-test)

    :param n: counts of observations in each cell (can be an array with dimensions num_samples x num_of_cells)
    :param p: expected probabilities of each cell (can be an array with dimensions num_samples x num_of_cells)
    :return: T statistic
    """
    n_total = np.sum(n, axis=-1, keepdims=True)
    if np.ndim(n)==1:
        n = np.array([count if (count>0) else (np.finfo(np.float32).eps) for count in n])
    elif np.ndim(n)==2:
        n = np.array([[count if (count>0) else (np.finfo(np.float32).eps) for count in nn] for nn in n])
    if np.ndim(p)==1:
        p = np.array([prob if (prob!=0) else (np.finfo(np.float32).eps) for prob in p])
    elif np.ndim(n)==2:
        p = np.array([[prob if (prob!=0) else (np.finfo(np.float32).eps) for prob in pp] for pp in p])
    T = n * np.log(n / (n_total * p))
    T = np.where(n == 0, 0, T).sum(axis=-1)
    return T

This checks the entries of n and p if they are one-dimensional, as well as the entries of the nested tuples in the case that n and p are two-dimensional (in the case where T-statistics are computed on bootstrapped samples) and replaces zeros with a small floating point value, in order to avoid division by zero.

Qub3k commented 3 years ago

Dear @FranzHahn, thank you for the comments and suggested fixes! 😊

I am amazed by the amount of effort you put to test our code. It's really motivating and encouraging. Thank you!

I will gladly implement your fixes. 🙂 Although, I will give you few days to stop me in case you would like to submit a pull request (in which case you would become a co-author of the repo, I guess). 🤔

Division Errors Influence on p-Values

We have tested the code quite thoroughly and I am 99% positive that division by 0 warnings are not influencing the resultant p-values. Each time this happens NaN gets written to the respective DataFrame. When the same DataFrame is then used to find the maximum of the likelihood function, those NaNs are simply ignored.

Counterintuitively Low p-Values

The GSD has in general heavier tails than normal distribution. Thus, response distributions like [0, 0, 15, 30, 15] can in principle generate low p-values (when tested for the goodness-of-fit of the GSD). Our motivation behind designing the GSD this way was that (when you consider the same response distribution [0, 0, 15, 30, 15]) it is difficult to believe that there would be no one assigning this stimulus a response of category 2 (since there are as many as 15 responses of category 3). Is this the proper approach? Well, this is a kind of a philosophical question and we are open for discussions. 😎

As a side note, please keep in mind that observing few, even very low, p-values in a single experiment is perfectly okay. The important thing to keep an eye on is the number of low p-values in comparison to the total number of p-values. This repo implements the generation of so called p-value P–P plots, which help assess whether the amount of low p-values is sufficiently high to start worrying. 😊

Qub3k commented 3 years ago

@FranzHahn, I incorporated your suggestions and acknowledged your help in the README file. Thank you again! 😊 I consider this issue closed.