NLL used as benchmark - Githubissues

detrin commented 1 year ago

Looking as http://proceedings.mlr.press/v119/duan20a/duan20a.pdf and looking at https://xgboost-distribution.readthedocs.io/en/latest/experiments.html it is not clear to me why NLL would be a good metric for model comparison. Isn't the better criteria to compare the performance with NGBoost divergence or directly loss function? How exactly is the distribution of every y^\hat prediction of the model incorporated in the benchmark?

CDonnerer commented 1 year ago

Using the log likelihood as an evaluation metric basically quantifies how well the predicted distributions fit the observed data. This is exactly the training objective here, since we're are fitting via maximum likelihood estimation.

Here's an illustration: For each datapoint in the test data, we have ($y_i$, $X_i$). We predict on each $X_i$ to get the distribution parameters. For example, in the case of a normal distribution, we'd get $(\mu_i, \sigma_i)$ for each datapoint. We can then put the $(y_i, \mu_i, \sigma_i)$ into the probability density function for the normal distribution to estimate the likelihood of observing that datapoint under that distribution. Here's the corresponding line in code. Note that the reported NLL values in the experiments page are the average of the NLLs for each datapoint.

detrin commented 1 year ago

@CDonnerer Thanks, that answers my questions. Now, I understand it better.

CDonnerer / xgboost-distribution

NLL used as benchmark #92