Error in run_classifier.py for attention_type=simulated_sparse

I am using script base_size.sh to run the class run_classifier.py. I am able to train and evaluate on imdb data for attention_type set as _originalfull and _blocksparse but when I set it to _simulatedsparse I see errors in initializing the training itself. The 12 layers are initialized but training doesn't start. The major error log is below:

File "/home/amitghattimare/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3211, in _as_graph_def
    graph.ParseFromString(compat.as_bytes(data))
google.protobuf.message.DecodeError: Error parsing message

I used the below script to run the code in case it helps in investigation. If I change attention_type to the other 2 options, it works fine. I am using only 8 cores because that's the max available in preemptible mode. I have reduced train_batch_size so that it fits in memory. I wonder if that's causing the issue though error logs don't indicate that.

python3 bigbird/classifier/run_classifier.py \
  --data_dir=tfds://imdb_reviews/plain_text \
  --output_dir=gs://bigbird-replication-bucket/classifier/imdb/sim_sparse_attention \
  --attention_type=simulated_sparse \
  --max_encoder_length=4096 \
  --num_attention_heads=12 \
  --num_hidden_layers=12 \
  --hidden_size=768 \
  --intermediate_size=3072 \
  --block_size=64 \
  --train_batch_size=1 \
  --eval_batch_size=2 \
  --do_train=True \
  --do_eval=False \
  --num_train_steps=1000 \
  --use_tpu=True \
  --tpu_name=bigbird \
  --tpu_zone=us-central1-b \
  --gcp_project=bigbird-replication \
  --num_tpu_cores=8 \
  --init_checkpoint=gs://bigbird-transformer/pretrain/bigbr_base/model.ckpt-0

google-research / bigbird

Error in run_classifier.py for attention_type=simulated_sparse #14