Quantitative assessment of the performance of pyCompressor + GANs

Radonirinaunimi commented 3 years ago

Compare the standard compressor with the Compressor + GANs using PDF-like estimators and/or metric distances:

[X] PDF-PDF correlation matrices
[X] Central Value-Standard Deviation relative uncertainties
[X] Frechet Inception Distance.
[X] KL distance.

@scarrazza: In addition to the above, are there any estimators/metrics you would like to consider?

scarrazza commented 3 years ago

Yes, we should include the PDF distance through the validphys action.

Radonirinaunimi commented 3 years ago

Below are some quantitative comparisons using FID. As a side note, the prior here contains 1000 MC replicas, which was enhanced to get 3000 MC replicas. Both the prior and enhanced sets are compressed down to 100. Each FID value (for all the flavours) measures the similarity between the prior and X, where X can just be a random replicas of size 100 (in blue), a compressed set from the prior (orange), or a compressed set from the enhanced (green). The FID is defined such that the smaller its value is, the closer X is to the prior. FIDs

Considering the difference between the blue and orange, the difference between the orange and green is somehow good. And, indeed, when looking at the correlations for instance, we see that the correlations from the enhanced look closer to the prior as compared to the standard. Gluon-Up_correlation Down-Antiup_correlation Antidown-Strange_correlation Antidown-Antistrange_correlation

EDIT: Add random to correlation plots.

Radonirinaunimi commented 3 years ago

The figures below show the comparison of the distributions along a given x-axis using the KL test. Both the standard and the enhanced distributions are tested against the prior. The smaller the value of v (or the larger the value of the probability p), the more the distribution is similar to the prior. As shown in the example below, in most of the cases, the enhanced yields a smaller value of v (or a larger value of p). dist_dbar

scarrazza commented 3 years ago

@Radonirinaunimi for these last plots, is the prior binning different from the compressions?

Radonirinaunimi commented 3 years ago

@Radonirinaunimi for these last plots, is the prior binning different from the compressions?

The binings are the same (even though the size of the datapoints are not), but the distributions are normalized such that the area under the histograms are all one.

scarrazza commented 3 years ago

Ok, thanks.

Radonirinaunimi commented 3 years ago

Concerning the PDF-PDF covariance matrices, are there ways to compare/assess two covariance matrices? Currently, I am computing the covariance matrix according to eq. 2 of the mc2hessian paper. Hence, for each set (prior, standard, enhanced), one gets a 4-dimensional matrix. What would be the test to compare them?

scarrazza commented 3 years ago

Why 4D? You take a grid of x points and compute for each flavour pair the correlation coefficient.

Radonirinaunimi commented 3 years ago

Why 4D? You take a grid of x points and compute for each flavour pair the correlation coefficient.

This I have done. And the same question above also applies here, especially since looking at the correlation matrices alone, one cannot assess what are the differences between the sets. For example, below are the correlation matrices for the 1000 prior and the 100 compressed (standard & enhanced) at 100 GeV: prior

I tried plotting the difference as in the compressor paper but the difference is not noticeable. I also had a look at some statistical tests (such as the Box's M test), but that was not conclusive. So, in the comparison of correlation matrices, we also need a way to assess the difference.

But coming back to the covariance matrix, I thought this was different and defined as below. Or what you actually meant was the above? Screenshot from 2021-01-04 11-21-39

EDIT: Change plots to 100 GeV and add colorbar.

scarrazza commented 3 years ago

You should take these matrices and compute the difference (or ratio) between them, i.e.: compressed-prior, standard-prior.

Radonirinaunimi commented 3 years ago

~When computing the difference (likewise for the ratio), we get similar figures (unnoticeable difference) for standard-prior & enhanced-prior. The figures are generated with 70 compressed at 100 GeV.~

When computing the difference between the correlation matrices, the prior-enhanced seems a bit clearer than the prior-standard. But, looking at these figures might not be enough to assess the real difference. P_vs_S P_vs_E

EDIT: Rectify bugs in plotting figures that removes the total white.

scarrazza commented 3 years ago

Strange, please remove the vmax and vmin constrains from [-1,1], the differences should be above 1%.

scarlehoff commented 3 years ago

Am I right to understand from the plots above (not the heatmaps) that the enhanced is (at least!) as good as the standard in all cases?

The standard would be exactly the same as one would get using the old compressor right?

(I'm trying to make sense of this plus the thread with Stefano to have a full picture of the last two weeks developments before the meeting later!)

Radonirinaunimi commented 3 years ago

Strange, please remove the vmax and vmin constrains from [-1,1], the differences should be above 1%.

So, there was a bug that drove the difference to be al zeros. This has now been fixed and that figures above have been updated. Now, we see a slight difference between the two (the prior-enhanced seems lighter that the prior-standard). But this might need to be tested further to really assess the difference.

Radonirinaunimi commented 3 years ago

Am I right to understand from the plots above (not the heatmaps) that the enhanced is (at least!) as good as the standard in all cases?

The standard would be exactly the same as one would get using the old compressor right?

(I'm trying to make sense of this plus the thread with Stefano to have a full picture of the last two weeks developments before the meeting later!)

Yes, so far, according to the checks above (plus the ERF plots), this seems to be correct.

scarrazza commented 3 years ago

Could you please project all values in a histogram for both differences?

Radonirinaunimi commented 3 years ago

Could you please project all values in a histogram for both differences?

Yes, I could do that! But a question: how would the histogram look like? Because, in the above plots, a correlation between the two flavours i-j is a square matrix of size (nx, nx) where nx is the size of the x-grid.

scarrazza commented 3 years ago

Just project all values, I would like to see how the distribution looks like, and ideally check we can compute some estimators like moments to understand which distribution is better in average.

Radonirinaunimi commented 3 years ago

Just project all values, I would like to see how the distribution looks like, and ideally check we can compute some estimators like moments to understand which distribution is better in average.

Got it! I will do this.

Radonirinaunimi commented 3 years ago

@scarrazza Is this what you had in mind? The histogram below compare the (prior-standard) in blue with (prior-enhanced) in red. To get a slightly visible plots as below, I however did two things:

I only consider a fewer x-grid points, i.e. instead of 100 as in the heatmaps I only considered 3 (evenly distributed along the entire x range). Hence, the histograms below might give the most accurate comparison.
I took the absolute of the matrix because otherwise there would be histograms pointing down and renders the whole plot ugly.

scarrazza commented 3 years ago

No, what I have in mind is a 1D histogram where in x you put all correlations values from the diff matrix (x1, x2, fl1, fl2) and in y their frequencies.

Radonirinaunimi commented 3 years ago

No, what I have in mind is a 1D histogram where in x you put all correlations values from the diff matrix (x1, x2, fl1, fl2) and in y their frequencies.

I see now (this makes more sense). The implementation of this can be seen below where we can see that not only the prior-enhanced is more centered at 0 but also yields a smaller standard deviation. hist_project

scarrazza commented 3 years ago

Perfect, this is the kind of killer plot in favour of enhanced!

Radonirinaunimi commented 3 years ago

So, now, I guess the plan would be:

Repeat all the quantitative tests but with the longer iterations for various size of compressed set (preferably from N=10-900).
Produce phenomenological studies. Here, we would have still to decide which kind of plots would be best to produce.
Cleaning up a the codes (which I am almost done) and eventually update the documentations.

Radonirinaunimi commented 3 years ago

So, I have done point (1). All the results are now available and I will upload them somewhere. Point (3) is a WIP (fully complete for the pycompressor code and still a few clean ups for the ganpdfs).

Concerning point (2), what kind of plots should actually be produced? I was thinking about the total cross as a function of the different higgs masses. This will compare the prior, standard, and enhanced for different size of compressed set. (maybe also the ration compressed/standard).

scarrazza commented 3 years ago

Concerning point (2), what kind of plots should actually be produced? I was thinking about the total cross as a function of the different higgs masses. This will compare the prior, standard, and enhanced for different size of compressed set. (maybe also the ration compressed/standard).

yes, you could use APPLgrid from here:

https://data.nnpdf.science/smpdf-applgrids/

Radonirinaunimi commented 3 years ago

Concerning point (2), what kind of plots should actually be produced? I was thinking about the total cross as a function of the different higgs masses. This will compare the prior, standard, and enhanced for different size of compressed set. (maybe also the ration compressed/standard).

yes, you could use APPLgrid from here:

https://data.nnpdf.science/smpdf-applgrids/

Thanks! I will have a look now.

Radonirinaunimi commented 3 years ago

So, I have done point (1). All the results are now available and I will upload them somewhere. Point (3) is a WIP (fully complete for the pycompressor code and still a few clean ups for the ganpdfs).

The results can be found here.

scarrazza commented 3 years ago

Here a small code for the computation of replica predictions from applgrid: applcheck.zip

Radonirinaunimi commented 3 years ago

Here a small code for the computation of replica predictions from applgrid: applcheck.zip

Thanks a lot! I will have a look.

Radonirinaunimi commented 3 years ago

Here is some plot of the Higgs total cross-sections. The difference at the level of the observables is very small (not sure if this is expected). pc ObsResults

scarrazza commented 3 years ago

Good, could you please project the CV ratio to Prior in a histogram?

Radonirinaunimi commented 3 years ago

Good, could you please project the CV ratio to Prior in a histogram?

Yes, here is the projection. It looks like for a few samples, the enhanced is a bit further away from 1 than the standard (with a difference at the order of 10^-3). However, I checked all the quantitative tests that we did before for these samples, and in all of them the enhanced always yield to better results. ProjectObsResults And here is the same plot in log-scale: LogProjectObsResults

N3PDF / pycompressor

Quantitative assessment of the performance of pyCompressor + GANs #29