flipz357 / smatchpp

A package for handy processing of semantic graphs such as AMR, with a special focus on standardized evaluation
GNU General Public License v3.0
17 stars 2 forks source link

Include the option to return all bootstrap scores and set random_state #7

Closed BramVanroy closed 10 months ago

BramVanroy commented 1 year ago

It could be useful if we could retrieve all scores from the bootstrapping. This can be useful when we want to compare multiple systems for significancy. Since you are using scipy bootstrap, I think youcan just optionally also return "bootsrap_distribution".

Secondly, for reproducibility, it might be a good idea to allow the option to provide a random state (fixed seed) which is then passed to scipy's bootstrap function (random_state parameter).

flipz357 commented 1 year ago

i) optionally return distribution: that's right, I think that would be a nice option, I'll look into it.

ii) reproducibility of confidence intervals: not sure about this one. If the confidence intervals need to be fully reproducible, a better option than fixing a random state could be to simply increase the number of drawn samples from 10k, to, e.g., 1000k, and wait a bit longer so that the numbers fully converge?

flipz357 commented 1 year ago

i) is now available, and it is described here. Note also that this feature can only be accessed if scipy has at least version 1.10.0.

ii) still not sure about this.

BramVanroy commented 1 year ago

Sorry for the delay in response.

Can you explain why fixing the seed is not a good idea? I think you mean from a statistical standpoint, because fixing the seed is "arbitrary" whereas increasing the sample size should lead to a converged number. But this is true for many things: in machine learning we also would like to just run the same experiment 100x times and produce sensible confidence intervals, but often times this is not feasible so we can use fixed seeds instead so that others - who want to - can at least reproduce it.

I am not saying that a fixed seed should be the standard, but having the option would be useful so users have control over it (but not the default)!

Thanks for including i) already!

flipz357 commented 1 year ago

Yes, you are right, I mean the statistical standpoint. I think a difference in your example is that the cost for running a machine learning model one more time is often quite high. By contrast, bootstrap samples are very cheap. So maybe there is a sweet spot where the confidence intervals converge (up to a negligible deviation) and the computation is still feasible/reasonable on some basic machine. Do you have a feeling for how much the deviation is that you currently experience with the current number of 10k samples?

flipz357 commented 10 months ago

Guess this can be closed now. Feel free to re-open if needed