Open KaroliShp opened 2 years ago
Will be adding more information to this comment as I am debugging it right now.
Final output of optimize_PACB
:
Epoch:0030 cost=0.13071768 mean accuracy 0.9954 KL div: 767.3750 A term: 0.0464 B term: 0.0843 Bquad: 781.7835 log_prior_std: -1.0725 B PAC: 0.0267 factor1: 9.2103 factor2: -9.2103
As you can see factor1 + factor2 ~= 0
, which, if I understand correctly, means that the term 2\log(j) ~= 0
, implying that j ~= 1
. However, once we enter evaluate_SNN_accuracy
, we see from my original post output that j = -15
.
Note that factor2=-9.2103
because 2*\log(1e-2) = -9.2103
. This comes fromtf.maximum(..,1e-2)
. Looking back at issue #3, this value is hardcoded to avoid nan
values in training. It now makes sense why we observe negative j
, confirming these bugs are connected and issue #3 was not fixed.
Looks to me like the problem is turning constrained optimization into unconstrained. Clearly, when using 1/2*log(\lambda)
, \lambda
can become greater than c
(and it does), which makes the 2log(j)
term undefined and this problem is not solved by simply using tf.maximum
as is done at the currently latest commit.
An alternative way to fix this problem instead of tf.maximum
is by using logistic and logit funtions to optimize over instead, which should fix the problem of incorrectly turning the constrained optimization into unconstrained. I can provide details for this later.
Reproduction following README: I can provide the pickle files if needed. I encountered this several times by varying number of pacb_epochs, for example 350. I did not alter the code on the master branch in any way.
Output:
Reason:
jdown
andjup
values are negative, even though these are supposed to be natural numbers:Then clearly
np.log(jdisc_down) = nan
andnp.log(jdisc_up) = nan
, giving nan as the PAC-Bayes bound.EDIT: I believe this is still the same issue as in https://github.com/gkdziugaite/pacbayes-opt/issues/3
EDIT 2: To be clear, I have the exact same environment as required by the README