使用rdrop, it shows a tf.split error. 大佬帮忙看看

JayYip / m3tl

BERT for Multitask Learning

Apache License 2.0

545 stars 125 forks source link

code from https://github.com/EdwardChan5000/m3tl_run

错误show below

2022-09-21 16:17:06.321 | INFO | m3tl.utils:set_phase:478 - Setting phase to infer 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:271 - Initial lr: 2e-05 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:272 - Train steps: 408675 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:273 - Warmup steps: 40867 2022-09-21 16:17:06.361554: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing. 2022-09-21 16:17:06.361588: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started. 2022-09-21 16:17:06.361613: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs 2022-09-21 16:17:06.369724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0 2022-09-21 16:17:06.655982: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down. 2022-09-21 16:17:06.656157: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed 2022-09-21 16:17:07.017 | INFO | m3tl.utils:set_phase:478 - Setting phase to train WARNING:tensorflow:The parameters output_attentions, output_hidden_states and use_cache cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: config=XConfig.from_pretrained('name', output_attentions=True)). WARNING:tensorflow:The parameter return_dict cannot be set in graph mode and will always be set to True. 2022-09-21 16:17:19.175 | INFO | m3tl.utils:set_phase:478 - Setting phase to train Traceback (most recent call last): File "m3tl_4room_rdrop.py", line 195, in main(args) File "m3tl_4room_rdrop.py", line 149, in main create_tf_record_only=False, model_dir=model_dir, mirrored_strategy=mirrored_strategy) File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 319, in train_bert_multitask verbose=verbose File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 163, in _train_bert_multitask_keras_model validation_steps=validation_steps File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit tmp_logs = self.train_function(iterator) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call result = self._call(*args, *kwds) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 855, in _call return self._stateless_fn(args, **kwds) # pylint: disable=not-callable File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in call filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call ctx=ctx) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2 [[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]] (1) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2 [[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]] [[BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1/_44]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_423171]

Errors may have originated from an input operation. Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1: BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1: BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Function call stack: train_function -> train_function

好的，我下周看看

On Thu, Sep 22, 2022, 8:35 PM Edward Chan @.***> wrote:

code from https://github.com/EdwardChan5000/m3tl_run

错误show below

2022-09-21 16:17:06.321 | INFO | m3tl.utils:set_phase:478 - Setting phase to infer 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:271 - Initial lr: 2e-05 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:272 - Train steps: 408675 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:273 - Warmup steps: 40867 2022-09-21 16:17:06.361554: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing. 2022-09-21 16:17:06.361588: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started. 2022-09-21 16:17:06.361613: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs 2022-09-21 16:17:06.369724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0 2022-09-21 16:17:06.655982: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down. 2022-09-21 16:17:06.656157: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed 2022-09-21 16:17:07.017 | INFO | m3tl.utils:set_phase:478 - Setting phase to train WARNING:tensorflow:The parameters output_attentions, output_hidden_states and use_cache cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: config=XConfig.from_pretrained('name', output_attentions=True)). WARNING:tensorflow:The parameter return_dict cannot be set in graph mode and will always be set to True. 2022-09-21 16:17:19.175 | INFO | m3tl.utils:set_phase:478 - Setting phase to train Traceback (most recent call last): File "m3tl_4room_rdrop.py", line 195, in main(args) File "m3tl_4room_rdrop.py", line 149, in main create_tf_record_only=False, model_dir=model_dir, mirrored_strategy=mirrored_strategy) File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 319, in train_bert_multitask verbose=verbose File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 163, in _train_bert_multitask_keras_model validation_steps=validation_steps File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit tmp_logs = self.train_function(iterator) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call result = self._call(*args, *kwds) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 855, in _call return self._stateless_fn(args, *kwds) # pylint: disable=not-callable File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in call* filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call ctx=ctx) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2 [[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]] (1) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2 [[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]]

[[BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1/_44]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_423171]

Errors may have originated from an input operation. Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1: BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1: BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Function call stack: train_function -> train_function

— Reply to this email directly, view it on GitHub https://github.com/JayYip/m3tl/issues/112, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS2OTDITUG23H7HZGCEYTDV7RHADANCNFSM6AAAAAAQTAIMCM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JayYip / m3tl

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112