Use linear scaling for Q2 and A - Githubissues

NNPDF / nnusf

An open source machine learning framework that provides predictions for all-energy neutrino structure functions.

https://nnpdf.github.io/nnusf/

GNU General Public License v3.0

0 stars 0 forks source link

Use linear scaling for Q2 and A #48

Closed Radonirinaunimi closed 1 year ago

Radonirinaunimi commented 1 year ago

Simplify the scaling of the $Q^2$ and $A$ inputs by simply re-mapping them between $0$ and $1$.

RoyStegeman commented 1 year ago

Why not just to [-1,1]?

Radonirinaunimi commented 1 year ago

I momentarily lost sight that this should be merged.

Why not just to [-1,1]?

Since now we use linear scaling for $Q^2$ and $A$, we can indeed try to re-scale everything between $[-1, 1]$. I can quickly run a fit to try this out.

RoyStegeman commented 1 year ago

Ok thanks, I'll have a look.

Radonirinaunimi commented 1 year ago

This is the report for a scaling between [-1, 1] https://data.nnpdf.science/NNUSF/reports/221027-001/output/, to be compared with [0, 1] https://data.nnpdf.science/NNUSF/reports/221025-scaling-001/output/. The overall $\chi^2$ are (slightly) deteriorated.

RoyStegeman commented 1 year ago

Hmm, I didn't really expect it to change anything. Rather the reason I prefer [-1,1] is that it makes the input symmetric wrt the activation function, so it simply seems like it should be the default way to do things.

Anyway, if I can guess about a possible explanation, I would think that the fact that with the simple linear scaling the Q points are more dense towards small values of the input to the NN, so if we scale to [0,1] this corresponds to having most points around the center of the activation function where the gradient is the largest.

RoyStegeman commented 1 year ago

After a simple "by eye" comparison of the data-prediction plots, it seems to me that indeed the small-Q points are more significantly impacted than the large-Q points, so that would support the above hypothesis.

Radonirinaunimi commented 1 year ago

Anyway, if I can guess about a possible explanation, I would think that the fact that with the simple linear scaling the Q points are more dense towards small values of the input to the NN, so if we scale to [0,1] this corresponds to having most points around the center of the activation function where the gradient is the largest.

Yes, this is exactly the case. If you recall the scaling plot, this is exactly what we saw in the Q2 distributions.

RoyStegeman commented 1 year ago

Yes, I know what the distribution of Q points looks like. Whether that is indeed the cause of the deterioration is a different story. It also seems as if NUTEV F2 might drive the deterioration

Radonirinaunimi commented 1 year ago

Yes, I know what the distribution of Q points looks like. Whether that is indeed the cause of the deterioration is a different story.

True! For the time being, I'd propose to reset this branch to 965724f and merge it, and we can investigate this in a new PR (?). The reason being that main is now really behind in terms of reports and deliveries we produce.

RoyStegeman commented 1 year ago

So I guess this should just be merged?

Radonirinaunimi commented 1 year ago

So I guess this should just be merged?

Yes, we can now merge this.