Max loss increases with the number of iterations

theonaunheim commented 3 years ago

Thanks for your interest as well as your issue, @lewaq .

Two questions if I may:

Would it be possible for you to send me the model you are working with via the model.to_json() method? That would allow me to see the parameters of the model you are working with.
How large is the increase in max loss? I am trying to determine whether this is intended behavior or whether it is a bug. Larger random samples tend to have larger maximum outliers.

E.g., normal distribution with mean of 100 and stdev of 20:

import numpy as np

# For each of these sample sizes
for x in [1, 10, 100, 1000, 10000]:
    # Get get that many random variates off the same curve
    rvs = np.random.normal(loc=100, scale=20, size=x)
    # Get the max value
    max_rvs = rvs.max()
    # Display
    print(f'For n={x}, the max is {max_rvs.round(2)}.')

Yields:

For n=1, the max is 96.69.
For n=10, the max is 131.34.
For n=100, the max is 145.71.
For n=1000, the max is 169.76.
For n=10000, the max is 178.44.

lewaq commented 3 years ago

Sorry for delayed response and thanks for the explanation. It makes sense, as with increased number of runs it is more likely to hit higher "rightmost" loss in the interval. I am relatively new to Fair, so it did not occur to me.

theonaunheim commented 3 years ago

No worries, @lewaq !

This was a useful exercise for me because it made me sit down and think about why this was actually occurring. FAIR definitely takes some getting used to.

Hive-Systems / pyfair

Max loss increases with the number of iterations #30