NVIDIA / nvtx-plugins

Python bindings for NVTX
https://docs.nvidia.com/deeplearning/frameworks/nvtx-plugins/user-guide/docs/en/stable/
Apache License 2.0
66 stars 15 forks source link

Register Bypass CPU OPs #11

Open DEKHTIARJonathan opened 4 years ago

DEKHTIARJonathan commented 4 years ago

The NVTX Ops should default to identity when no GPU Device is registered.

Use-case, being able to run example scripts on a CPU machine and making sure the project compiled properly

  File "examples/keras_example.py", line 85, in <module>
    callbacks=[nvtx_callback])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit
    total_epochs=epochs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2363, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NvtxStart' used by {{node model/nvtx_start/NvtxStart}}with these attrs: [T=DT_FLOAT]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  device='GPU'; T in [DT_COMPLEX128]
  device='GPU'; T in [DT_COMPLEX64]
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_BFLOAT16]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_INT8]
  device='GPU'; T in [DT_UINT8]
  device='GPU'; T in [DT_INT16]
  device='GPU'; T in [DT_UINT16]
  device='GPU'; T in [DT_INT32]
  device='GPU'; T in [DT_INT64]

     [[model/nvtx_start/NvtxStart]] [Op:__inference_distributed_function_1075]
maxhgerlach commented 3 years ago

@DEKHTIARJonathan, is there a specific reason why you suggest to add identity "bypass" CPU ops rather than actual tracing ops as proposed in PR #25?