The workaround for the issue of tensorflow(>=1.3)

LyleW commented 6 years ago

It's the issue of tensorflow(>=1.3): https://github.com/tensorflow/tensorflow/issues/12598

Workaround: To use an arbitrary initializer for variables, and then assign value to the ref which returned from the initialization pass.

tarvaina commented 6 years ago

Thanks for digging in!

Would you check that the initialization of the parameters actually matches the original? I ran python train.py with master branch and the pull request and got different orders of magnitude for consistency cost:

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$ python train_svhn.py
INFO:main:Saved tensorboard graph to 'results/train_svhn/2018-02-02_14:46:13/0/tensorboard'
INFO:main:Model variables initialized
INFO:main:step     0:   eval/error/ema:  92.5%,  eval/error/1:  92.5%,  eval/class_cost/1:   5.964205
INFO:main:Saved checkpoint: 'results/train_svhn/2018-02-02_14:46:13/0/transient/checkpoint-0'
INFO:main:step    20:   train/error/1: 100.0%,  train/class_cost/1:   0.021648,  train/cons_cost/mt:   0.008475
INFO:main:step    40:   train/error/1: 100.0%,  train/class_cost/1:   0.020818,  train/cons_cost/mt:   0.007544
^C

$ git checkout LyleW-master
Switched to branch 'LyleW-master'

$ python train_svhn.py
INFO:main:Saved tensorboard graph to 'results/train_svhn/2018-02-02_14:49:59/0/tensorboard'
INFO:main:Model variables initialized
INFO:main:step     0:   eval/error/ema:  93.0%,  eval/error/1:  93.0%,  eval/class_cost/1:   2.301507
INFO:main:Saved checkpoint: 'results/train_svhn/2018-02-02_14:49:59/0/transient/checkpoint-0'
INFO:main:step    20:   train/error/1: 100.0%,  train/class_cost/1:   0.023213,  train/cons_cost/mt:   0.000001
INFO:main:step    40:   train/error/1: 100.0%,  train/class_cost/1:   0.023038,  train/cons_cost/mt:   0.000001
^C

Note the train/cons_cost/mt on step 20: 0.008475 vs 0.000001. It suggests that the initialization is actually different.

LyleW commented 6 years ago

Sorry, I haven't noticed that.

Currently, I found a new workaround about this issue from TensorFlow variable initializers broken #13351

from tensorflow.python.ops import variables`
def passthrough(obj, value): return value
try:
  variables.Variable._build_initializer_expr=passthrough
except: # older versions of TF don't have this
  pass

It seems better than this PR. But still, I haven't check does it will cause any other side effect. I already push it to LyleW-master, maybe you can try it if you have interest about it. Thanks for your reply.

tarvaina commented 6 years ago

Thanks. Looks like this may get fixed in TensorFlow.

For now, I will not merge this. I am happy to do so if you or someone else can show that either the initialization remains the same or the final accuracy does not worsen.

crayonshinchan5sai commented 5 years ago

Sorry, I haven't noticed that.

Currently, I found a new workaround about this issue from TensorFlow variable initializers broken #13351
from tensorflow.python.ops import variables`
def passthrough(obj, value): return value
try:
  variables.Variable._build_initializer_expr=passthrough
except: # older versions of TF don't have this
  pass
It seems better than this PR. But still, I haven't check does it will cause any other side effect. I already push it to LyleW-master, maybe you can try it if you have interest about it. Thanks for your reply.

How long does this train phase has go on? I have run 24+ hours, and the eval cost is still not very low though the train cost is OK. After I finished training, when I started other experiments, the train cost seems to be high again, the whole model seems to be start all over again, how to use the pre-trained model to run other experiments? Thanks in advance!

CuriousAI / mean-teacher

The workaround for the issue of tensorflow(>=1.3) #5