Number of gradients does not match with number of inputs for layer norm

I'm on an macbook pro 2017 with Intel CPU and RX560, using tf-macos alpha3. I tried to train BERT with TF-macos but the gradient won't get through layer norm layers until I replace python/keras/layers/normalization.py with the official release one. The crash is as follows:

INFO:tensorflow:training_loop marked as finished
I0409 01:05:30.787653 4499283456 error_handling.py:115] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0409 01:05:30.787855 4499283456 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
  File "run_classifier.py", line 972, in <module>
    tf.compat.v1.app.run()
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "run_classifier.py", line 820, in main
    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3130, in train
    rendezvous.raise_errors()
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
    six.reraise(typ, value, traceback)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3120, in train
    return super(TPUEstimator, self).train(
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1203, in _train_model_default
    estimator_spec = self._call_model_fn(features, labels, ModeKeys.TRAIN,
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2961, in _call_model_fn
    return super(TPUEstimator, self)._call_model_fn(features, labels, mode,
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1163, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3219, in _model_fn
    estimator_spec = model_fn_wrapper.call_without_tpu(
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1729, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2072, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "run_classifier.py", line 603, in model_fn
    train_op = optimization.create_optimizer(
  File "/Users/didi/dev/nlp/git/CLUE/baselines/models/bert/optimization.py", line 71, in create_optimizer
    grads = tf.gradients(ys=loss, xs=tvars)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/ops/gradients_impl.py", line 315, in gradients_v2
    return gradients_util._GradientsHelper(
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/ops/gradients_util.py", line 691, in _GradientsHelper
    _VerifyGeneratedGradients(in_grads, op)
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/ops/gradients_util.py", line 255, in _VerifyGeneratedGradients
    raise ValueError("Num gradients %d generated for op %s do not match num "
ValueError: Num gradients 2 generated for op name: "bert/encoder/layer_11/output/layer_normalization_24/MLCLayerNorm"
op: "MLCLayerNorm"
input: "bert/encoder/layer_11/output/add"
input: "bert/encoder/layer_11/output/layer_normalization_24/MLCLayerNorm/ReadVariableOp"
input: "bert/encoder/layer_11/output/layer_normalization_24/MLCLayerNorm/ReadVariableOp_1"
input: "bert/encoder/layer_11/output/layer_normalization_24/MLCLayerNorm/norm_shape"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "Tidx"
  value {
    type: DT_INT32
  }
}
attr {
  key: "epsilon"
  value {
    f: 9.999999960041972e-13
  }
}
 do not match num inputs 4

I first tried to comment MLC-related lines in LayerNormalization class, in line 1283-1300, but this time batch norm jumps out and stabbed me in the back:

2021-04-09 01:26:07.187030: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-04-09 01:26:11.826 python[22839:933864] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[__NSArrayM setObject:atIndexedSubscript:]: object cannot be nil'
*** First throw call stack:
(
    0   CoreFoundation                      0x00007fff204f36af __exceptionPreprocess + 242
    1   libobjc.A.dylib                     0x00007fff2022b3c9 objc_exception_throw + 48
    2   CoreFoundation                      0x00007fff205a7a9a -[__NSCFString characterAtIndex:].cold.1 + 0
    3   CoreFoundation                      0x00007fff205a5b73 -[__NSArrayM setObject:atIndexedSubscript:].cold.2 + 0
    4   CoreFoundation                      0x00007fff2049072e -[__NSArrayM setObject:atIndexedSubscript:] + 679
    5   MLCompute                           0x00007fff2a0ebd62 +[_MLCGPUBatchNormalization allocateMovingUpdaterWith:gpuOps:mean:variance:momentum:deviceIndex:] + 435
    6   MLCompute                           0x00007fff2a0ec803 -[_MLCGPUBatchNormalization initWithDevice:fusedWithNeuronDescriptor:numOfFeatureChannels:mean:variance:beta:gamma:varianceEpsilon:momentum:] + 1641
    7   MLCompute                           0x00007fff2a0ec161 -[_MLCGPUBatchNormalization initWithDevice:numOfFeatureChannels:mean:variance:beta:gamma:varianceEpsilon:momentum:] + 141
    8   MLCompute                           0x00007fff2a0eca3d +[_MLCGPUBatchNormalization layerWithDevice:numOfFeatureChannels:mean:variance:beta:gamma:varianceEpsilon:momentum:] + 166
    9   MLCompute                           0x00007fff2a06a33e -[MLCDeviceGPU(MLCLayerOperations) batchNormalizationLayerWithChannelCount:mean:variance:beta:gamma:varianceEpsilon:momentum:] + 169
    10  MLCompute                           0x00007fff2a053658 -[MLCBatchNormalizationLayer compileForDevice:sourceTensors:resultTensor:] + 1570
    11  MLCompute                           0x00007fff2a0c4c8e -[MLCTrainingGraph compileWithOptions:device:] + 3561
    12  _pywrap_tensorflow_internal.so      0x000000011cb12b41 _ZN10tensorflow9mlcompute7convert26MLCGraphConversionPassImpl15ConvertSubgraphEPNS_15OpKernelContextEPNS1_11TFGraphInfoEPKNS_5GraphERKNSt3__16vectorINSA_12basic_stringIcNSA_11char_traitsIcEENSA_9allocatorIcEEEENSF_ISH_EEEERKNSB_IiNSF_IiEEEEPNS1_24MLCSubgraphConvertResultE + 4385
    13  _pywrap_tensorflow_internal.so      0x000000011caeb346 _ZN10tensorflow9mlcompute7kernels13MLCSubgraphOp20ProcessMLCSubgraphOpEPNS_15OpKernelContextEPPNS1_10MLCContextEPPNS1_15TFContextStatusE + 438
    14  _pywrap_tensorflow_internal.so      0x000000011caeed54 _ZN10tensorflow9mlcompute7kernels13MLCSubgraphOp7ComputeEPNS_15OpKernelContextE + 868
    15  libtensorflow_framework.2.dylib     0x0000000134b17961 _ZN10tensorflow12_GLOBAL__N_113ExecutorStateINS_15PropagatorStateEE7ProcessENS2_10TaggedNodeEx + 4033
    16  libtensorflow_framework.2.dylib     0x0000000134b19aed _ZNSt3__110__function6__funcIZN10tensorflow12_GLOBAL__N_113ExecutorStateINS2_15PropagatorStateEE7RunTaskIZNS6_13ScheduleReadyEPN4absl14lts_2020_02_2513InlinedVectorINS5_10TaggedNodeELm8ENS_9allocatorISB_EEEEPNS5_20TaggedNodeReadyQueueEEUlvE0_EEvOT_EUlvE_NSC_ISL_EEFvvEEclEv + 45
    17  libtensorflow_framework.2.dylib     0x0000000134bb9af3 _ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi + 1667
    18  libtensorflow_framework.2.dylib     0x0000000134bb9372 _ZZN10tensorflow6thread16EigenEnvironment12CreateThreadENSt3__18functionIFvvEEEENKUlvE_clEv + 66
    19  libtensorflow_framework.2.dylib     0x0000000134ba7428 _ZN10tensorflow12_GLOBAL__N_17PThread8ThreadFnEPv + 104
    20  libsystem_pthread.dylib             0x00007fff20381950 _pthread_start + 224
    21  libsystem_pthread.dylib             0x00007fff2037d47b thread_start + 15
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Fatal Python error: Aborted

Thread 0x000070000409d000 (most recent call first):
  File "/Users/didi/opt/anaconda3/lib/python3.8/threading.py", line 302 in wait
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/summary/writer/event_file_writer.py", line 266 in get
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/summary/writer/event_file_writer.py", line 209 in run
  File "/Users/didi/opt/anaconda3/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/Users/didi/opt/anaconda3/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x000000010f8cce00 (most recent call first):
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1451 in _call_tf_sessionrun
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1359 in _run_fn
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375 in _do_call
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1368 in _do_run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1190 in _run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 967 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1200 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1437 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1369 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1279 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 774 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1514 in _train_with_estimator_spec
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1206 in _train_model_default
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175 in _train_model
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 349 in train
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3120 in train
  File "run_classifier.py", line 819 in main
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/absl/app.py", line 251 in _run_main
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/absl/app.py", line 303 in run
  File "/Users/didi/dev/tools/tensorflow_macos/tensorflow_macos/tf-macos/lib/python3.8/site-packages/tensorflow/python/platform/app.py", line 40 in run
  File "run_classifier.py", line 971 in <module>
run_classifier_iflytek.sh: line 93: 22839 Abort trap: 6           python run_classifier.py --task_name=$TASK_NAME --do_train=true --do_eval=true --data_dir=$GLUE_DATA_DIR/$TASK_NAME --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --output_dir=$CURRENT_DIR/${TASK_NAME}_output/

Then I decided to replace python/keras/layers/normalization.py entirely. This time it can finally run ...... until it eats up all the memory, with device set to either cpu or gpu. https://github.com/apple/tensorflow_macos/issues/39

apple / tensorflow_macos

Number of gradients does not match with number of inputs for layer norm #227