ValueError: Input graph and Layer graph are not the same

euler2.0 wiki中的例子：examples/graphsage/python run_graphsage.py 该例子默认使用 estimator.train() 和 estimator.evaluate() 完成训练和验证。但如果使用 estimator.train_and_evaluate() 会报错，报错信息为：ValueError: Input graph and Layer graph are not the same: Tensor("MPGather:0", shape=(?, 1433), dtype=float32) is not from the passed-in graph. （我是分布式环境报错的，单机环境不确定是否报错）

对于 tf.estimator 来说，需要将所有 tf op 操作都放入 model_fn 以及 input_fn，否则就会出现这个报错。

以报错的源码为例：tf_euler/python/convolution/sage_conv.py。 init函数只会在第一次建图的时候调用，对于 estimator.train 或者 estimator.evaluate 因为全程只建一次图，不会出错。但是对于 estimator.trainand_evaluate，其中的 evaluate 过程会多次建图，这样导致从第二次建图开始 init 中的 tf op 操作不被运行到，导致报错。正确的方式是将所有 tf op 操作都迁移出 init 函数。下图中，注释掉的是原来会导致报错的写法，高亮的是修正后的写法。由于该问题在 euler2.0 中出现频繁，需要修改的地方非常多，我懒得提 pull requests，写个 issues 帮助其他小伙伴排查问题。

注意： 这种写法可能引入新的问题，举个例子：假如在 init 函数中 self.fc = tf.layer.Dense(dim); 在call函数中 a = self.fc(x); b = self.fc(y)；那么 a，b的两个网络是同一个。但是如果删除 init 函数中的 self.fc = tf.layer.Dense(dim); 在 call 函数中写 a = tf.layer.Dense(dim)(x); b = tf.layer.Dense(dim)(x)；那么a，b两个网络不是同一个。所以，在需要参数共享的场景，额外考虑下参数共享的问题。

alibaba / euler

ValueError: Input graph and Layer graph are not the same #314