Closed InterferencePattern closed 1 year ago
@jimbudarz interesting, we would have to dig deeper to determine the root cause, but I suspect it's some mismatch between eager mode behaviour (does onnx runtime work in eager mode only?) and the graph mode that is required to run CFs in alibi due to core algorithm being written using TF1.x constructs (for now). The reason Anchors work is because they don't have any TF code internally.
Just as an aside, do Anchors work if you disable eager mode and pass the same predict_fn
as in the example?
@jklaise Thanks for the response. Anchors also fails when eager mode is disabled.
RuntimeError: Attempting to capture an EagerTensor without building a function.
However, if I don't disable eager mode for CF and CEM, I get the following:
RuntimeError Traceback (most recent call last)
Sorry- to answer your first question:
does onnx runtime work in eager mode only?
Making predictions with eager mode disabled results in the same error:
RuntimeError: Attempting to capture an EagerTensor without building a function.
@jimbudarz thanks for the follow-up. To summarize, eager mode needs to be disabled to run CF/CEM algorithms because we use TF1.x constructs in the code. This seems to be incompatible with the way ONNX runtime works with TF2.x models.
In the long run we will port the CF/CEM code to TF2.x constructs (see #403) but there are a few hurdles with performance and some more development time is needed.
In the short term, we would need to investigate if running ONNX TF2.x models with eager mode disabled is something that is feasible at all, this would require digging into the ONNX protocol a bit more. A very simple thing to check first (if you can) is if you can make predictions using the ONNX runtime with a TF2.x model with eager mode disabled.
Thanks for looking into this.
A very simple thing to check first (if you can) is if you can make predictions using the ONNX runtime with a TF2.x model with eager mode disabled.
I was able to test this- it fails to make predictions with eager mode disabled.
@jimbudarz right, that confirms my suspicion. The next step would be to look into ONNX docs wrt running TensorFlow models with eager mode disabled, I suspect either this is not supported (for TF2.x models) or some extra things need to be done when either exporting to ONNX or cofiguring the runtime. Another avenue worth exploring is TF1.x ONNX support as TF1.x doesn't have a concept of eager mode so if support exists then possible lessons can be taken to enable similar functionality for TF2.x (i.e. running without eager mode).
Another thing to explore is ONNX support with TF2.x models that have tf.function
decorators as that would also force graph mode.
I've just replicated this experiment with the original Keras model, and I realize that this might not be an ONNX issue- it might be incompatibility with any keras model trained with eager mode enabled.
When I try to run contrastive explanations without eager mode disabled
shape = (1,) + img_stack.shape[1:]
mode = 'PN'
cem = CEM(model.predict, mode, shape, kappa=0., beta=.1,
feature_range=(img_stack.min(), img_stack.max()),
gamma=100, max_iterations=1000,
c_init=1., c_steps=10, learning_rate_init=1e-2,
clip=(-1000.,1000.), no_info_val=-1.)
I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-17-dff1c063c46f> in <module>
12 target_class=target_class, max_iter=max_iter, lam_init=lam_init,
13 max_lam_steps=max_lam_steps, learning_rate_init=learning_rate_init,
---> 14 feature_range=feature_range)
15
16 start_time = time()
/.../lib/python3.7/site-packages/alibi/explainers/counterfactual.py in __init__(self, predict_fn, shape, distance_fn, target_proba, target_class, max_iter, early_stop, lam_init, max_lam_steps, tol, learning_rate_init, feature_range, eps, init, decay, write_dir, debug, sess)
199
200 # lambda hyperparameter - placeholder instead of variable as annealed in first epoch
--> 201 self.lam = tf.placeholder(tf.float32, shape=(self.batch_size), name='lam')
202
203 # define placeholders that will be assigned to relevant variables
/.../lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py in placeholder(dtype, shape, name)
3174 """
3175 if context.executing_eagerly():
-> 3176 raise RuntimeError("tf.placeholder() is not compatible with "
3177 "eager execution.")
3178
RuntimeError: tf.placeholder() is not compatible with eager execution.
And if I run it with eager mode disabled, I can't make predictions at all:
ValueError: Calling `Model.predict` in graph mode is not supported when the `Model` instance was constructed with eager mode enabled. Please construct your `Model` instance in graph mode or call `Model.predict` with eager mode enabled.
I don't know if this is valuable information for you as you build out TF2 support, but I figured I could provide it anyway.
Stale
Hi, I'm finding that my ONNX image classification model (loaded with the ONNX package and converted to TensorFlow) works with AnchorImage but not with Counterfactuals or CEM. I've tried providing the model directly to the CounterFactual object, but I've also tried with a predict function (since the model expects inputs of a different shape). Neither way is successful, but the latter approach is shown below.
Is there a reason that AnchorImage and Counterfactuals/CEM would treat this model differently under the hood?
Here is the failing code, followed by the error.
Thank you for your help.
RuntimeError Traceback (most recent call last)