Why use normalized predicted values and non normalized target values to calculate losses？ - Githubissues

gitbooo / CrossViVit

This repository contains code for the paper "Improving day-ahead Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context"

https://arxiv.org/abs/2306.01112

MIT License

75 stars 5 forks source link

Why use normalized predicted values and non normalized target values to calculate losses？ #5

Closed Tangjhno1 closed 1 year ago

Tangjhno1 commented 1 year ago

How to explain this operation？Won't it lose accuracy？

jaggbow commented 1 year ago

We didn't experiment with non-normalized targets though it might improve the accuracy. We might retrain the model with this change and if it has better results, we will update the results accordingly as well as the checkpoints.

Thank you for your observation.

Tangjhno1 commented 1 year ago

I'm sorry you may not understand what I mean. I saw in the source code that the target values used for calculating losses are not normalized, and the loss values given in the paper are all around 50. I don't understand the meaning of this.

jaggbow commented 1 year ago

I just saw in the title that you mentioned "normalized predicted values", but we don't normalize the predicted values at all. Since the targets are not normalized, it's normal that the loss would be around 50 or so since that value correspond to the non-normalized GHI. In the updated version of the paper, we will include normalized metrics such as MAPE.

Tangjhno1 commented 1 year ago

The normalization of the predicted values I mentioned in the title refers to subtracting the mean and dividing the variance from the input data x.The value of the predicted result is also between 0 and 1, but the true value used to calculate the loss is not normalized. I've never seen this before, so I'm confused about the motive. Thank you very much for your reply!

jaggbow commented 1 year ago

I understand better your question now. The inputs are indeed normalized, but since the model is trained to match non-normalized targets, it will learn to have non-normalized values outside of [0,1] since there's no normalization occuring at the latest weight matrix which means that weight matrix can take any values such that the target is matched.

I do agree though that using normalized targets might improve the performance further since I think it will stabilize the training (the latest weight matrix values wouldn't have to be very big to compensate for input values between [0,1] and unbounded output values) but we didn't do it in our experiments.

I hope this clarifies things further and thanks for your questions!

Tangjhno1 commented 1 year ago

Thank you very much for your answer！ I tried to normalize the target value, but found that the model convergence curve was not as good as before.It may be necessary to adjust the value of the hyper-parameters.

I still have one question that I don't understand.The meteorological station data mentioned in your paper contains measurements of the pressure in the station, clear sky components, Direct Normal Irradiance(DNI), and Diffuse Horizontal Irradiance (DHI).But in the experimental code, DIF, DIR, GHI, PoPoPoPo, dhi, dni, and ghi were used.The PoPoPoPo in this represents the pressure in the station.GHI(ghi) is the irradiance, what do uppercase and lowercase represent respectively?The other values are also？

jaggbow commented 1 year ago

We only use a subset of these channels because all stations share that same susbset of channels. Here's a breakdown of the meaning of each channel:

GHI: GHI at the station
DIF [W/m**2]: DHI at the station
DIR [W/m**2]: DNI at the station
PoPoPoPo [hPa]: Pressure at the station
dhi: Clear-Sky DHI
dni: Clear-Sky DNI
ghi: Clear-Sky GHI

Tangjhno1 commented 1 year ago

Thank you for your clarification. Your research is very meaningful and has been of great help to me！