apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.75k stars 3.47k forks source link

TVM on AMD GPU. Errors: LLVM ERROR: Unknown specifier in datalayout string #8286

Closed YongtaoHuang1994 closed 3 years ago

YongtaoHuang1994 commented 3 years ago

We have prepared the AMD GPU and its SDK ROCm.

$ lspci
$ 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 550 640SP / RX 560/560X] (rev ff)
$ /opt/rocm/bin/rocminfo | grep Vendor
  Vendor Name:             CPU
  Vendor Name:             AMD

Then we turn on "set(USE_LLVM llvm-config-9)" and "set(USE_ROCM on)" and build TVM with ROCm successfully. Next, we set the env variable:

$ export TVM_HOME=/home/hyongtao/tvm_src/tvm
$ export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}

After all the dependence is ready, we test resnet in TVM with ROCm. The inference code is shown as followed:

import os
import numpy as np
import tvm
from tvm import relay, autotvm
import tvm.relay.testing
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
import tvm.contrib.graph_executor as runtime

def get_network(name, batch_size):
    """Get the symbol definition and random weight of a network"""
    input_shape = (batch_size, 3, 224, 224)
    output_shape = (batch_size, 1000)

    if "resnet" in name:
        n_layer = int(name.split("-")[1])
        mod, params = relay.testing.resnet.get_workload(
            num_layers=n_layer, batch_size=batch_size, dtype=dtype
        )
    else:
        raise ValueError("Unsupported network: " + name)

    return mod, params, input_shape, output_shape

target = tvm.target.aocm()

network = "resnet-18"
log_file = "%s.log" % network
dtype = "float32"

def tune_and_evaluate():
    # extract workloads from relay program
    print("Extract tasks...")
    mod, params, input_shape, out_shape = get_network(network, batch_size=1)

    with tvm.transform.PassContext(opt_level=3):
        lib = relay.build_module.build(mod, target=target, params=params)

    # load parameters
    dev = tvm.device(str(target), 0)
    module = runtime.GraphModule(lib["default"](dev))
    data_tvm = tvm.nd.array((np.random.uniform(size=input_shape)).astype(dtype))
    module.set_input("data", data_tvm)

    # evaluate
    print("Evaluate inference time cost...")
    ftimer = module.module.time_evaluator("run", dev, number=1, repeat=600)
    prof_res = np.array(ftimer().results) * 1000  # convert to millisecond
    print(
        "Mean inference time (std dev): %.2f ms (%.2f ms)"
        % (np.mean(prof_res), np.std(prof_res))
    )

tune_and_evaluate()

Then we get these errors:

[16:00:20] /home/hyongtao/tvm_src/tvm/src/target/target_kind.cc:182: Warning: Unable to detect ROCm compute arch, default to "-mcpu=gfx900" instead
[16:00:20] /home/hyongtao/tvm_src/tvm/src/target/target_kind.cc:196: Warning: Unable to detect ROCm version, assuming >= 3.5
[16:00:20] /home/hyongtao/tvm_src/tvm/src/target/target_kind.cc:196: Warning: Unable to detect ROCm version, assuming >= 3.5
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
[16:00:27] /home/hyongtao/tvm_src/tvm/src/target/target_kind.cc:196: Warning: Unable to detect ROCm version, assuming >= 3.5
[16:00:27] /home/hyongtao/tvm_src/tvm/src/target/llvm/codegen_amdgpu.cc:54: Warning: Cannot get maximum number of threads for AMD codegen
[16:00:27] /home/hyongtao/tvm_src/tvm/src/target/llvm/codegen_amdgpu.cc:54: Warning: Cannot get maximum number of threads for AMD codegen
LLVM ERROR: Unknown specifier in datalayout string
(tvm-build) root@hyongtao-Precision-Tower-5810:/home/hyongtao/tvm_demo#

Could you help me solve this problem? Thanks a lot.

masahi commented 3 years ago

Please open a thread on discuss. The warning msg Warning: Unable to detect ROCm compute arch suggests that there is some issue with rocm runtime. Make sure you have installed rocm correctly and that a normal rocm app works.