Postfit filter too lenient

RoyStegeman commented 4 years ago

@scarlehoff I don't know how the postfit filter determines outliers, but it appears to be quite lenient. Of course sometimes fits simply end to early and for those we have essentially two options: filter all that haven't ended because of patience or keep them. But this is a different situation. All fits were ended by patience and there is a clear distinction between two sets of fits, but it's probably due to the relative size of the outliers cluster that it has not been removed in postfit. I think this particular problem could be quite easily solved by applying k-means clustering.

scarrazza commented 4 years ago

The current postfit implementation applies a dynamic chi2 and arc-length vetoes based on the standard deviation of replicas. For a given requested number of replicas the algorithm computes recursively (=dynamically) the mean and standard deviation (sigma) of chi2 and arc-lenghts by eliminating replicas outside a sigma threshold. The current threshold is 4-sigma, if you want decrease that value, for local testing purposes, just change the 2 lines in: https://github.com/NNPDF/nnpdf/blob/4afa559a6e67bde4fc845d96c3c5c5f9b6e5038e/validphys2/src/validphys/fitveto.py#L19

Now, the fact that you get 2 orthogonal clusters looks suspicious, in the sense that there is some (hyper) parameter which makes the fit very unstable and requires a more careful investigation.

Other algorithms like k-means or affinity propagation (see https://arxiv.org/pdf/1605.04345) could work in determining the average size of the cluster, but will not solve the instability. Ideally, we should try to avoid as much as possible post-selection vetoes, for multiple reasons like performance (waste of computing resources) and fit quality. It was particular pronounced the difference between n3fit and nnfit where the latter was systematically wasting 40% of the replicas while the other few percent.

RoyStegeman commented 4 years ago

Thanks, that's a good point. The benefit of being aware of the instability and trying to understand and fix it outweigh the costs of having to manually filter the replica's from time to time.

Zaharid commented 4 years ago

@RoyStegeman In all honesty the chi² criterion we use is a bit absurd as it is. Developing something better would be quite interesting. I don't believe that would necessarily be based on clustering (at least not only) but rather on something involving the absolute values of the training and validation chi². You can quite easily compute the parameters of the chi² distribution that we would have in the ideal case where we fit everything and the come up with some sort of tolerance on top of that. Maybe @wilsonmr has some ideas from the closure test experiments here.

Zaharid commented 4 years ago

In particular one thing we want from the chi² filter is to discard most replicas in crazy fits such as this one.

And one thing we want from the methodology is not to make use of the chi² filter at all.

Zaharid commented 4 years ago

I am thinking one possible improvement is to look at the best chi² instead of the min chi². Then we interpret the postfit filter as an hypothesis test "this replica is as good as the best we can do" and choose somehow based on the difference in chi².

cc @voisey

RoyStegeman commented 4 years ago

I am thinking one possible improvement is to look at the best chi² instead of the min chi².

How would you define the best chi²?

Anyway, let's discuss this on Wednesday. @scarrazza can I put this item in an agenda somewhere or should I remember to bring it up myself?

scarrazza commented 4 years ago

@RoyStegeman, don't worry this issue is already part of the agenda.

Zaharid commented 4 years ago

Interestingly when I try a rather more aggressive cut on the validation chi², I see that the replica distribution is much unchanged. See:

https://vp.nnpdf.science/aPL5KPKiTyuQv3_uSHQAhA==

cc @stefanoforte i can't say I understand this...

stefanoforte commented 4 years ago

well, the 68cl and one sigma seem to perfectly agree, yes? So this is to good approximation Gaussian, and thus if you cut off the edge it remains Gaussian! In other words, it is only when you have fat tails that you expect that cutting off would change results substantially: you start with 68%cl and one sigma very different, and by cutting the tail you make them less different. Here they are the same before cutting, and remain the same after cutting!

Cheers

Stefano

Zaharid writes:

Interestingly when I try a rather more aggressive cut on the validation chi², I see that the replica distribution is much unchanged. See:

https://vp.nnpdf.science/aPL5KPKiTyuQv3_uSHQAhA==

cc @stefanoforte i can't say I understand this...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.*

--

Zaharid commented 4 years ago

I guess I was expecting that the replicas with larger fit chi² were more likely to be outliers. That seems not to be the case. Gigh validation chi² don't seem to be correlated with larger differences from the mean in PDF space, so you end with a gaussian with the same parameters.

stefanoforte commented 4 years ago

Indeed, I agree. On the other hand, this is consisten with the fact that before the cut the 68%cl and one sigma are essentially the same: if there were significant outliers, one sigma would have been rather larger than 68%cl

Cheers

stefano

Zaharid writes:

I guess I was expecting that the replicas with larger fit chi² were more likely to be outliers. That seems not to be the case. Gigh validation chi² don't seem to be correlated with larger differences from the mean in PDF space, so you end with a gaussian with the same parameters.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.*

--

NNPDF / nnpdf

Postfit filter too lenient #740