Can ted policy use distributed training ?

What problem are you trying to solve?

I'm trying to use multiple gpus to optimize ted policy during training time.

What's your suggested solution?

At first, I tried to add mirrored strategy to training function of ted policy. Below is the run_training in ted_policy.py, I added a segment where I use mirrored strategy and the old segment which uses only one gpu.

Examples (if relevant)

def run_training( self, model_data: RasaModelData, label_ids: Optional[np.ndarray] = None ) -> None: """Feeds the featurized training data to the model.

Args: model_data: Featurized training data. label_ids: Label ids corresponding to the data points in model_data. These may or may not be used by the function depending on how the policy is trained. """

os.environ.pop('TF_CONFIG', None)
tf_config = {
'cluster': { # 'worker': ['localhost:12345', 'localhost:23456'] # }

,
'task': {'type': 'worker', 'index': 0}
}
os.environ['TF_CONFIG'] = json.dumps(tf_config)
tf_config = json.loads(os.environ['TF_CONFIG'])
num_workers = len(tf_config['cluster']['worker']) if not self.finetune_mode:
This means the model wasn't loaded from a
previously trained model and hence needs
to be instantiated. self.model = self.model_class()( model_data.get_signature(), self.config, isinstance(self.featurizer, MaxHistoryTrackerFeaturizer), self._label_data, self._entity_tag_specs, ) self.model.compile( optimizer=tf.keras.optimizers.Adam(self.config[LEARNING_RATE]) ) ( data_generator, validation_data_generator, ) = rasa.utils.train_utils.create_data_generators( model_data, self.config[BATCH_SIZES], self.config[EPOCHS], self.config[BATCH_STRATEGY], self.config[EVAL_NUM_EXAMPLES], self.config[RANDOM_SEED], ) callbacks = rasa.utils.train_utils.create_common_callbacks( self.config[EPOCHS], self.config[TENSORBOARD_LOG_DIR], self.config[TENSORBOARD_LOG_LEVEL], self.tmp_checkpoint_dir, ) self.model.fit( data_generator, epochs=self.config[EPOCHS], validation_data=validation_data_generator, validation_freq=self.config[EVAL_NUM_EPOCHS], callbacks=callbacks, verbose=False, shuffle=False, # we use custom shuffle inside data generator )

global_batch_size = self.config[BATCH_SIZES]*2
tf.debugging.set_log_device_placement(True) gpus = tf.config.list_logical_devices('GPU') strategy = tf.distribute.MirroredStrategy(gpus)

if not self.finetune_mode:
This means the model wasn't loaded from a
previously trained model and hence needs
to be instantiated. with strategy.scope(): self.model = self.model_class()( model_data.get_signature(), self.config, isinstance(self.featurizer, MaxHistoryTrackerFeaturizer), self._label_data, self._entity_tag_specs, ) self.model.compile( optimizer=tf.keras.optimizers.Adam(self.config[LEARNING_RATE]) )

( data_generator, validation_data_generator, ) = rasa.utils.train_utils.create_data_generators( model_data, global_batch_size, self.config[EPOCHS], self.config[BATCH_STRATEGY], self.config[EVAL_NUM_EXAMPLES], self.config[RANDOM_SEED], ) callbacks = rasa.utils.train_utils.create_common_callbacks( self.config[EPOCHS], self.config[TENSORBOARD_LOG_DIR], self.config[TENSORBOARD_LOG_LEVEL], self.tmp_checkpoint_dir, ) self.model.fit( data_generator, epochs=self.config[EPOCHS], validation_data=validation_data_generator, validation_freq=self.config[EVAL_NUM_EPOCHS], callbacks=callbacks, verbose=False, shuffle=False, # we use custom shuffle inside data generator

The first run without mirrored strategy is okay but when running with mirrored strategy, there is a conflict in ted model. I can't figure out where is the cause of it (the distributed training makes it very hard to debug).

Is anything blocking this from being implemented? (if relevant)

and this log after running:

/root/rasa/rasa/shared/core/slot_mappings.py:216: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more. UserWarning, /root/rasa/rasa/shared/core/slot_mappings.py:216: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more. UserWarning, Processed story blocks: 100%|███| 13/13 00:00<00:00, 1271.09it/s, # trackers=1 Processed story blocks: 100%|███| 13/13 00:00<00:00, 148.73it/s, # trackers=12 Processed story blocks: 100%|████| 13/13 00:00<00:00, 21.72it/s, # trackers=50 Processed story blocks: 100%|████| 13/13 00:00<00:00, 26.74it/s, # trackers=50 Processed rules: 100%|███████████| 48/48 00:00<00:00, 252.89it/s, # trackers=1 /root/rasa/rasa/utils/train_utils.py:530: UserWarning: constrain_similarities is set to False. It is recommended to set it to True when using cross-entropy loss. category=UserWarning, /root/rasa/rasa/shared/utils/io.py:99: UserWarning: 'evaluate_every_number_of_epochs=20' is greater than 'epochs=2'. No evaluation will occur. Processed trackers: 100%|█████| 512/512 00:00<00:00, 965.76it/s, # action=1635 /root/rasa/rasa/utils/tensorflow/model_data_utils.py:384: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray np.array(values), number_of_dimensions=4 /root/rasa/rasa/utils/tensorflow/model_data_utils.py:400: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray MASK: [FeatureArray(np.array(attribute_masks), number_of_dimensions=3)] 2022-02-17 08:55:08.464804: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-02-17 08:55:09.599068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30652 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0 2022-02-17 08:55:09.601503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30652 MB memory: -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0 /root/rasa/rasa/utils/tensorflow/model_data.py:750: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray np.concatenate(np.array(f)), Epochs: 0%| | 0/2 [00:00

2022-02-17 08:55:10.764997: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_1_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_1_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_1_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_2_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_2_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_2_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) Epochs: 50%|██▌ | 1/2 [00:33<00:33, 33.25s/it, t_loss=6, loss=5.72, acc=0.518]/root/rasa/rasa/utils/tensorflow/model_data.py:750: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray np.concatenate(np.array(f)), Epochs: 100%|██| 2/2 [00:53<00:00, 26.99s/it, t_loss=5.44, loss=4.94, acc=0.921] Epochs: 0%| | 0/2 [00:00 WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance. 2022-02-17 08:56:05.702741: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_2" op: "FlatMapDataset" input: "TensorDataset/_1" attr { key: "Targuments" value { list { } } } attr { key: "f" value { func { name: "__inference_Dataset_flat_map_flat_map_fn_21967" } } } attr { key: "output_shapes" value { list { shape { dim { size: -1 } dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } } shape { dim { size: -1 } } shape { dim { size: -1 } } shape { dim { size: -1 } dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } } shape { dim { size: -1 } } shape { dim { size: -1 } dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } dim { size: -1 } } shape { dim { size: -1 } } shape { dim { size: -1 } } } } } attr { key: "output_types" value { list { type: DT_FLOAT type: DT_INT64 type: DT_FLOAT type: DT_INT64 type: DT_FLOAT type: DT_FLOAT type: DT_INT64 type: DT_FLOAT type: DT_INT64 type: DT_FLOAT type: DT_FLOAT type: DT_INT64 type: DT_FLOAT type: DT_INT64 } } } . Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`. /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_4_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_4_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_4_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_5_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_5_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_5_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_6_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_6_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_6_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/replica_1/cond_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/replica_1/cond_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/replica_1/cond_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/replica_1/cond_1_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/replica_1/cond_1_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/replica_1/cond_1_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) /root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/replica_1/cond_2_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/replica_1/cond_2_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/replica_1/cond_2_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value) Traceback (most recent call last): File "/root/rasa/rasa/engine/graph.py", line 467, in __call__ output = self._fn(self._component, **run_kwargs) File "/root/rasa/rasa/core/policies/ted_policy.py", line 777, in train self.run_training(model_data, label_ids) File "/root/rasa/rasa/core/policies/ted_policy.py", line 740, in run_training shuffle=False, # we use custom shuffle inside data generator File "/root/rasa/rasa/utils/tensorflow/temp_keras_modules.py", line 190, in fit tmp_logs = train_function(iterator) File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__ result = self._call(*args, **kwds) File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call return self._stateless_fn(*args, **kwds) File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3040, in __call__ filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 596, in call ctx=ctx) File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 3 root error(s) found. (0) Invalid argument: Dimensions [0,1) of indices[shape=[17,2]] must match dimensions [0,1) of updates[shape=[24,50]] [[{{node cond_4/StatefulPartitionedCall/cond_4_20/then/_877/cond_4/ScatterNd}}]] [[div_no_nan_1/ReadVariableOp/_892]] (1) Invalid argument: Dimensions [0,1) of indices[shape=[17,2]] must match dimensions [0,1) of updates[shape=[24,50]] [[{{node cond_4/StatefulPartitionedCall/cond_4_20/then/_877/cond_4/ScatterNd}}]] (2) Invalid argument: Dimensions [0,1) of indices[shape=[17,2]] must match dimensions [0,1) of updates[shape=[24,50]] [[{{node cond_4/StatefulPartitionedCall/cond_4_20/then/_877/cond_4/ScatterNd}}]] [[update_0/AssignAddVariableOp/_845]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_52312] Function call stack: train_function -> train_function -> train_function The above exception was the direct cause of the following exception: Traceback (most recent call last): File "run.py", line 54, in run(cmdline_arguments) File "/root/rasa/tools/controller.py", line 53, in run run_train_core(args) File "/root/rasa/tools/training_tools.py", line 90, in run_train_core finetuning_epoch_fraction=args.epoch_fraction, File "/root/rasa/rasa/model_training.py", line 346, in train_core **(additional_arguments or {}), File "/root/rasa/rasa/model_training.py", line 242, in _train_graph is_finetuning=is_finetuning, File "/root/rasa/rasa/engine/training/graph_trainer.py", line 108, in train graph_runner.run(inputs= {PLACEHOLDER_IMPORTER: importer} ) File "/root/rasa/rasa/engine/runner/dask.py", line 106, in run dask_result = dask.get(run_graph, run_targets) File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 558, in get_sync **kwargs, File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 496, in get_async for key, res_info, failed in queue_get(queue).result(): File "/root/miniconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result return self.__get_result() File "/root/miniconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 538, in submit fut.set_result(fn(*args, **kwargs)) File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 234, in batch_execute_tasks return [execute_task(*a) for a in it] File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 234, in return [execute_task(*a) for a in it] File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 225, in execute_task result = pack_exception(e, dumps) File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 220, in execute_task result = _execute_task(task, data) File "/root/rasa/.venv/lib/python3.7/site-packages/dask/core.py", line 119, in _execute_task return func(*(_execute_task(a, cache) for a in args)) File "/root/rasa/rasa/engine/graph.py", line 476, in __call__ ) from e rasa.engine.exceptions.GraphComponentException: Error running graph component for node train_TEDPolicy2. Epochs: 0%| | 0/2 [00:23 2022-02-17 08:56:29.372037: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated. [[{{node PyFunc}}]] ### Definition of Done I really need help from anybody who is familiar with Ted model and distributed training to fix this bug. much appreciated!

RasaHQ / rasa