droefs / HLISA

A reimplementation of the Selenium API, emulating human interactions
https://pypi.org/project/HLISA/
GNU General Public License v3.0
72 stars 8 forks source link

util.py function std_positive(mean, std, minimal) infinite loop #18

Closed basvdl97 closed 2 years ago

basvdl97 commented 2 years ago

util.py function std_positive(mean, std, minimal) can get stuck in an infinite loop if mean <= minimal.

def std_positive(mean, std, minimal):
    sample = np.random.normal(mean, std)
    while sample < minimal:
        sample += random.random() * (mean - minimal)
    return sample

If sample is initialized to a value below minimal, then the loop will start. If the parameters are chosen such that mean <= minimal. Then

sample += random.random() * (mean - minimal)

will only continue to decrease the value of sample, and thus remain in the loop.

*I will implement a fix if this is seen as a valid issue by @droefs or a valid maintainer.

droefs commented 2 years ago

Thank you, this is a great catch. It would be better if this can not happen.

If someone would choose the parameters such that mean < minimal, that would also be a bit strange and is probably not what the caller would intent. Maybe an error would be the best option here, as the caller will then get notified of the strange parameter choice. Except for the case mean == minimal, which is in a valid option.

basvdl97 commented 2 years ago

Okay, an exception is fine for cases with mean < minimum if that is not what people intend to do. However, in the case mean == minimal we still remain in an infinite loop, adding 0 continuously.

What about this implementation that avoids using a while loop, and handles all cases:

def std_positive(mean, std, minimal):
    sample = np.random.normal(mean, std)
    if sample < minimal:
        sample += (minimal-sample) + ((mean + 4 * std) - minimal)*random.random()
    return sample

We can still add an exception for mean < minimal if those parameters are undesired.

I think this also barely impacts wanting to pick from a normal distribution, since when sample < minimal we spread sample out evenly across the part of the distribution in the range (minimal, mean+4*std).

*Let me know what you think and I will execute your wishes @droefs .

droefs commented 2 years ago

Thank you, this is an elegant, and more importantly, a correct solution.

we spread sample out evenly across the part of the distribution in the range

It is great that you incorporated this, but because the left tail of the distribution is partly removed by the minimum parameter, in this case a slightly uneven spread is nicer. If the value that was too small is replaced by a value from the remaining part of the distribution (with even chance), the distribution will move to the right, causing the mean of the std_positive function to be larger than the mean parameter supplied by it. Because of that, this part of your solution is already enough: (and the rest of the lines of course)

sample += (minimal-sample) + (mean - minimal)*random.random()

It will pick a value evenly across the left part of the distribution, which will limit the effect of the missing tail on the mean.

It would be great to have this code incorporated in HLISA. If you make a pull request I would be happy to merge it. We can add a ValueError or a custom IllegalArgumentError I think for the mean < minimal case.

basvdl97 commented 2 years ago

Great! I will make a pull request when I have time, and close the issue.

Thanks for the response.