NVIDIA / nvtx-plugins

Python bindings for NVTX
https://docs.nvidia.com/deeplearning/frameworks/nvtx-plugins/user-guide/docs/en/stable/
Apache License 2.0
66 stars 15 forks source link

Putting Markers inside Tensorflow estimator based code not showing any output on nvvp #2

Closed mankeyboy closed 5 years ago

mankeyboy commented 5 years ago

Stack: CUDA: 10.1.243/NV 418.87 Tensorflow-gpu: 1.14.0 nvtx-plugins: 0.1.3

The plugins themselves have been tested with the examples and I can see the nvvp output containing these markers.

Code: https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py

As can be seen, this is an Estimator class based run, in line 195 onwards. According to the docs for this plugin here, it says that I should use this plugin hook and add it to the set of hooks while making the estimator call. So, I modified the code like this:

Line 174 onwards:

# Evaluate model
    if use_synthetic:
        num_records = num_iterations * batch_size
    elif mode == 'validation':
        num_records = get_tfrecords_count(data_files)
    elif mode == 'benchmark':
        num_records = len(data_files)
    else:
        raise ValueError("Mode must be either 'validation' or 'benchmark'")
    logger = LoggerHook(
        display_every=display_every,
        batch_size=batch_size,
        num_records=num_records)
    nvtx_callback = NVTXHook(skip_n_steps=1, name='Inference')   #Addition made here for the hook
    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    estimator = tf.estimator.Estimator(
        model_fn=model_fn,
        config=tf.estimator.RunConfig(session_config=tf_config),
        model_dir='model_dir')
    results = {}
    estimator_input_fn = functools.partial(input_fn, model, data_files, batch_size, use_synthetic, mode)
    if mode == 'validation':
        results = estimator.evaluate(estimator_input_fn, steps=num_iterations, hooks=[logger, nvtx_callback])  # Added the callback to the hook here for validation
    elif mode == 'benchmark':
        benchmark_hook = BenchmarkHook(target_duration=target_duration, iteration_limit=num_iterations)
        prediction_results = [p for p in estimator.predict(estimator_input_fn, predict_keys=["classes"],  hooks=[logger, benchmark_hook, nvtx_callback])]   # Added the callback to the hook here for synthetic run
    else:
        raise ValueError("Mode must be either 'validation' or 'benchmark'")
    # Gather additional results

However, I see no markers in the nvvp output after doing an nvprof. Please tell me what I'm doing wrong or if there is some other additions I need to be making as well.

ahmadki commented 5 years ago

I'm unable to reproduce, I copied your code then used the following command to run the classification code: nsys profile -d 60 -w true --sample=cpu -t 'nvtx,cuda' -o nsys_output python image_classification.py --model resnet_v1_50 --use_trt --use_synthetic --mode benchmark --num_iterations 100

And the result: Screenshot_20191003_002213

This was done using Nsight systems but I got similar results using nvvp (Nsight system is the recommended way as nvvp will be deprecated in future CUDA version).

Do you mind sharing the following information:

mankeyboy commented 5 years ago

@ahmadki Listing all the details you asked for here: tensorflow-gpu: (POWER9 system) Installed using POWERAI (powerai version 1.6.1) conda packages, version 1.14.0 nvtx-plugins: Compiled from source

Output on the example code inside nvtx-plugins repository: NVVP is used since NVIDIA doesn't have nsys support for Power systems. image

Output on the code using nvprof and opening in nvvp: Command used: nvprof -f -o img_class_p9-v100.nvvp \ python image_classification.py --model resnet_v1_50 --batch_size 128 --data_dir . --use_trt --num_iterations 100 --use_synthetic --precision fp16 --mode benchmark As can be seen, there are no markers and ranges section at all in this. Furthermore, nvvp on launching this file says that 6531 markers were unable to be associated with timeline elements.

image

YakiT commented 5 years ago

This looks like an nvvp issue on Power9. I will try to find someone from the NVIDIA dev-tools team who could help.

mankeyboy commented 5 years ago

@YakiT thanks, should I open a bug elsewhere as well to get this issue sorted out?

YakiT commented 5 years ago

I opened a bug to the nvvp team. However, since nvvp is being deprecated (replaced by Nsight Systems), I cannot commit on a fix date for this bug.

mankeyboy commented 5 years ago

I understand. If NVVP is being deprecated, can I know a timeline of Nsight Systems support on POWER systems?

YakiT commented 5 years ago

I cannot commit on a date for this.