Closed JustPlay closed 8 months ago
The responsibility of the host is to execute the script:
host_tools/python/gpu_cc_tool.py
. What problem did you encounter?root@p133-011-144:~/tdx/scripts/nvidia.d/nvtrust/host_tools/python# python3 gpu_cc_tool.py --query-cc-settings NVIDIA GPU Tools version 535.86.06 File "gpu_cc_tool.py", line 127, in find_gpus_sysfs dev = Gpu(dev_path=dev_path) File "gpu_cc_tool.py", line 2055, in __init__ self.bar0 = self._map_bar(0) File "gpu_cc_tool.py", line 1163, in _map_bar return FileMap("/dev/mem", bar_addr, bar_size) File "gpu_cc_tool.py", line 239, in __init__ mapped = mmap.mmap(f.fileno(), size, mmap.MAP_SHARED, prot, offset=offset) 2023-10-08,11:07:09.880 ERROR GPU /sys/bus/pci/devices/0000:0f:00.0 broken: [Errno 1] Operation not permitted 2023-10-08,11:07:09.884 ERROR Config space working True Traceback (most recent call last): File "gpu_cc_tool.py", line 127, in find_gpus_sysfs dev = Gpu(dev_path=dev_path) File "gpu_cc_tool.py", line 2055, in __init__ self.bar0 = self._map_bar(0) File "gpu_cc_tool.py", line 1163, in _map_bar return FileMap("/dev/mem", bar_addr, bar_size) File "gpu_cc_tool.py", line 239, in __init__ mapped = mmap.mmap(f.fileno(), size, mmap.MAP_SHARED, prot, offset=offset) PermissionError: [Errno 1] Operation not permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "gpu_cc_tool.py", line 2499, in
The Nvidia Driver is not required to be installed on the host machine.
To handle the issue of failed mmap, the following steps might help:
dmesg
to see if there is any error message/dev/mem
. You could ask the search engine for help. One possible answer: https://stackoverflow.com/questions/8213671/mmap-operation-not-permitted2. https://stackoverflow.com/questions/8213671/mmap-operation-not-permitted
yes, i found it is due to CONFIG_STRICT_DEVMEM=y # Filter access to /dev/mem CONFIG_IO_STRICT_DEVMEM=y # Filter I/O access to /dev/mem
Have you solved this problem? If not, it seems that changing mmio_access_type
to sysfs
may help.
Have you solved this problem? If not, it seems that changing
mmio_access_type
tosysfs
may help.
i have try-ed mmio_access_type=sysfs, but did not work
root@p133-011-144:~/tdx/scripts/nvidia.d/nvtrust/host_tools/python# python3 gpu_cc_tool.py --query-cc-mode
NVIDIA GPU Tools version 535.104.12
file=/sys/bus/pci/devices/0000:0f:00.0/resource0, size=16777216, offset=0
File "gpu_cc_tool.py", line 128, in find_gpus_sysfs
dev = Gpu(dev_path=dev_path)
File "gpu_cc_tool.py", line 2059, in __init__
self.bar0 = self._map_bar(0)
File "gpu_cc_tool.py", line 1165, in _map_bar
return FileMap(os.path.join(self.dev_path, f"resource{self._bar_num_to_sysfs_resource(bar_num)}"), 0, bar_size)
File "gpu_cc_tool.py", line 241, in __init__
mapped = mmap.mmap(f.fileno(), size, mmap.MAP_SHARED, prot, offset=offset)
2023-10-08,13:02:41.287 ERROR GPU /sys/bus/pci/devices/0000:0f:00.0 broken: [Errno 22] Invalid argument
2023-10-08,13:02:41.291 ERROR Config space working True
I add iomem=relaxed
to the host kernel cmdline, the gpu_cc_tools.py seems to work in both devmem mode and sysfs mode (没有每一步测试和继续验证,因为机器死机了,还在抢救)
Have you solved this problem? If not, it seems that changing
mmio_access_type
tosysfs
may help.
CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=n
可能也ok,
https://elixir.bootlin.com/linux/v6.5.5/source/lib/Kconfig.debug#L1838
If you could recompile the host kernel, setting CONFIG_IO_STRICT_DEVMEM to N may help. I do not have access to H800 but I could mmap a device address on my Ubuntu which sets CONFIG_IO_STRICT_DEVMEM to be N.
You shouldn't need the kernel parameter listed above. When you try to set the GPU mode, can you first show me the output of lspci -vd 10de:
?
Closing due to inactivity. Short version is that one does not need to install the driver in the host to toggle CC modes. Please reference our deployment guide for step-by-step instructions to configure your machine.
The responsibility of the host is to execute the script:
host_tools/python/gpu_cc_tool.py
. What problem did you encounter?