Closed LyleW closed 6 years ago
Thanks for digging in!
Would you check that the initialization of the parameters actually matches the original? I ran python train.py
with master branch and the pull request and got different orders of magnitude for consistency cost:
$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ python train_svhn.py
INFO:main:Saved tensorboard graph to 'results/train_svhn/2018-02-02_14:46:13/0/tensorboard'
INFO:main:Model variables initialized
INFO:main:step 0: eval/error/ema: 92.5%, eval/error/1: 92.5%, eval/class_cost/1: 5.964205
INFO:main:Saved checkpoint: 'results/train_svhn/2018-02-02_14:46:13/0/transient/checkpoint-0'
INFO:main:step 20: train/error/1: 100.0%, train/class_cost/1: 0.021648, train/cons_cost/mt: 0.008475
INFO:main:step 40: train/error/1: 100.0%, train/class_cost/1: 0.020818, train/cons_cost/mt: 0.007544
^C
$ git checkout LyleW-master
Switched to branch 'LyleW-master'
$ python train_svhn.py
INFO:main:Saved tensorboard graph to 'results/train_svhn/2018-02-02_14:49:59/0/tensorboard'
INFO:main:Model variables initialized
INFO:main:step 0: eval/error/ema: 93.0%, eval/error/1: 93.0%, eval/class_cost/1: 2.301507
INFO:main:Saved checkpoint: 'results/train_svhn/2018-02-02_14:49:59/0/transient/checkpoint-0'
INFO:main:step 20: train/error/1: 100.0%, train/class_cost/1: 0.023213, train/cons_cost/mt: 0.000001
INFO:main:step 40: train/error/1: 100.0%, train/class_cost/1: 0.023038, train/cons_cost/mt: 0.000001
^C
Note the train/cons_cost/mt
on step 20: 0.008475 vs 0.000001. It suggests that the initialization is actually different.
Sorry, I haven't noticed that.
Currently, I found a new workaround about this issue from TensorFlow variable initializers broken #13351
from tensorflow.python.ops import variables`
def passthrough(obj, value): return value
try:
variables.Variable._build_initializer_expr=passthrough
except: # older versions of TF don't have this
pass
It seems better than this PR. But still, I haven't check does it will cause any other side effect. I already push it to LyleW-master, maybe you can try it if you have interest about it. Thanks for your reply.
Thanks. Looks like this may get fixed in TensorFlow.
For now, I will not merge this. I am happy to do so if you or someone else can show that either the initialization remains the same or the final accuracy does not worsen.
Sorry, I haven't noticed that.
Currently, I found a new workaround about this issue from TensorFlow variable initializers broken #13351
from tensorflow.python.ops import variables` def passthrough(obj, value): return value try: variables.Variable._build_initializer_expr=passthrough except: # older versions of TF don't have this pass
It seems better than this PR. But still, I haven't check does it will cause any other side effect. I already push it to LyleW-master, maybe you can try it if you have interest about it. Thanks for your reply.
How long does this train phase has go on? I have run 24+ hours, and the eval cost is still not very low though the train cost is OK. After I finished training, when I started other experiments, the train cost seems to be high again, the whole model seems to be start all over again, how to use the pre-trained model to run other experiments? Thanks in advance!
It's the issue of tensorflow(>=1.3): https://github.com/tensorflow/tensorflow/issues/12598
Workaround: To use an arbitrary initializer for variables, and then assign value to the ref which returned from the initialization pass.