Hardcoded PCI address range

arki05 commented 3 years ago

In the readme you wrote "Physical PCI address range 0xf0000000-0xf1000000" in my case the range turned out to be 0x4810000000-0x4811ffffff

The Address Range can be found in dmesg in lines like this: pci 0000:0a:00.0: reg 0x1c: [mem 0x4810000000-0x4811ffffff 64bit pref]

This is Probably due to above 4g decoding (not 100% sure, but my best guess)

DualCoder commented 3 years ago

I don't actually know what the addresses printed in dmesg are, although it would make sense for them to be physical PCI addresses.

However I really doubt that the kernel will log the regions mapped by the ioremap syscall into dmesg. I found these addresses by running an MMIO-trace. If you do that you can examine all the MAP commands which correlates to the ioremap syscalls.

Anyway, I do believe that the hardcoded addresses in this region may be an issue for graphics cards that are not Pascal based. Unfortunately I do not have any other graphics card to test with. Do you have a graphics card that doesn't work with these addresses?

arki05 commented 3 years ago

First of all you seem to have much more knowledge about these things, if you have any good sources for learning more about this stuff, I would appreciate it if you could share them with me.

Secondly I would probably clarify what I meant above. I tried your vgpu_unlock with a 1070 on a AMD 2950x system. The vgpu_unlock_hook didn't work. It seemed like the physical memory region was different. When comparing the nvidia-debug-report you posted previously with my own logs, that at my Physical PCI address range seemed to be different. So I changed VGPU_UNLOCK_MAGIC_PHYS_BEG (0xf0029624) to VGPU_UNLOCK_MAGIC_PHYS_BEG (0x4810029624) (and the same for the key line) and then everything worked. How or why this happened, but i think some logic to automatically determine this range would be nice.

KrutavShah commented 3 years ago

You know what, that's pretty good information, I feel like it should be included in the README so that AMD users, more specifically AMD Zen users can get help for setting this software up on their machines. Great findings!

SebastiaanDirks commented 3 years ago

I might be having the same issue. I'm not sure how to check the logging, but it's not working on my system. Specs:

EPYC Milan
RTX 3080

Looked through the dmesg like @arki05 and found this line which is similar to his pci 0000:c1:00.0: reg 0x1c: [mem 0x18010000000-0x18011ffffff 64bit pref]

After changing the magic to VGPU_UNLOCK_MAGIC_PHYS_BEG (0x1800029624) it unfortunately still doesn't work.

I'm not sure if this is even the issue, but it's the only relevant problem I could find

KrutavShah commented 3 years ago

The fix only works if you are certain that it's a memory address range issue. Are you trying to emulate an RTX A40? That card seems to be a fair bit different as well, but that shouldn't be a limiting factor as we have seen with the GTX 1060 running this script.

Do you happen to have an intel system to try this on? My intel based system didn't require an address change. I'm assuming the 10GB VRAM may require a slightly higher memory range though.

SebastiaanDirks commented 3 years ago

I do have an Intel i3-9100F that I could spin up tomorrow if that's even going to work?

arki05 commented 3 years ago

In vgpu_unlock_hooks.c there is a section for enabling logs. You have to change the 0 to a 1

/* Debug logs can be enabled here. */
if 0
    define LOG(...) printk(__VA_ARGS__)
else
    define LOG(...)
endif

Try to enable logs and rebuild & reinstall the DKMS module. Reboot and post you dmesg / logs. That might help find the issue. You can add additional log() statements to vgpu_unlock_hook.c if you are not sure wether some function is executed correctly / with which parameters it gets executed.

SebastiaanDirks commented 3 years ago

Already enabled the logs, but I'm not sure where to find them. I can post the dmesg here tomorrow for sure

DualCoder commented 3 years ago

It is likely the same issue. The addresses printed in dmesg is the PCI BAR (Base Address Register) setup by the kernel, for more information on how that works see this wikipedia article: PCI configuration space.

What we are interested in is BAR 3 (documented here) which maps the cards VRAM onto the PCI bus. From my understanding some code is being written into the cards VRAM using this mapping, then the card's Falcon microprocessor is used to execute that code which generates the magic and key values, and the those can be read back by the driver.

Unfortunately I believe that the different generations of cards has different versions of the Falcon microprocessor, so the code used might not be the same. It is therefore also likely that the offset into BAR 3 will have to be different for the different generations of cards.

As far as I know vgpu_unlock has only been tested on Pascal (10-series) graphics cards.

If anyone is interested in providing additional log files for analysis, I would like MMIO-traces for the execution of nvidia-smi on different cards. Instructions for generating these logs can be found here. If your system contains sensitive information, you might want to filter out all accesses not related to the GPU PCI devices.

KrutavShah commented 3 years ago

RTX 30 series has resizable BAR support, so I would assume that this new generation of GPU uses a greater memory space than previous cards. There should be a way to get a beginning and end value for that space, so would it work if you plugged those values into the script?

DualCoder commented 3 years ago

It would, if you knew the offset of the magic and key values, and those offsets were constant.

SebastiaanDirks commented 3 years ago

Alright, just ran the mmio trace and the dmesg. The MMIO trace was apparently really difficult or something, because I don't think I got it to work properly. nvidia-smi just returned "No device found" while it worked normally in Ubuntu. There were some other quirks as about every step of the guide just went different. I've uploaded it anyway and hopefully its still helpful

dmesg_3080.log mmiotrace_3080.log

Will test my Intel system next

DualCoder commented 3 years ago

Unfortunately it doesn't look like the memory regions that I am interested in was accessed during the recording of that log file. This is likely related to nvidia-smi showing "No device found", the device should still be listed even if MMIO-trace is runnig.

We can list NVIDIA devices found by mmiotrace (annotations and formatting added for readability):

$ grep -e "PCIDEV .*10de" mmiotrace_3080.log
PCIDEV c100 10de2206 7d fa000000 1800000000c 0      1801000000c 0 f001 fb000000 1000000 10000000 0      2000000 0 80 80000
PCIDEV c101 10de1aef 7c fb080000 0           0      0           0 0    0        4000    0        0      0       0 0  0      snd_hda_intel
            ^pciid      ^bar0    ^bar1       ^bar2  ^bar3                       ^len0   ^len1    ^len2  ^len3

The first device is the RTX3080 GPU (pci dev id 0x2206) which we are interested in and the second device is an audio device (probably for sound over HDMI) which is not interesting. We can see that there are three initialized bars on the GPU, BAR0, BAR1 and BAR3.

We can now look at all mapping commands:

Show

``` $ grep -e "^MAP" mmiotrace_3080.log MAP 432.175965 1 0xfa000000 0xffffa524801ab000 0x1000 0x0 0 MAP 432.175995 2 0xfa000000 0xffffa524801ad000 0x1000 0x0 0 MAP 432.251520 3 0xfa000000 0xffffa52487000000 0x1000000 0x0 0 MAP 432.292611 4 0xa91bd000 0xffffa524801ab000 0x1000 0x0 0 MAP 432.292794 5 0xa91bc000 0xffffa524801ad000 0x1000 0x0 0 MAP 432.292889 6 0xa91bc000 0xffffa52480793000 0x1000 0x0 0 MAP 432.293639 7 0xa8d50000 0xffffa52480795000 0x1000 0x0 0 MAP 432.293760 8 0xa8d56000 0xffffa52480823000 0x1000 0x0 0 MAP 432.293877 9 0xa8d55000 0xffffa52480825000 0x1000 0x0 0 MAP 432.293995 10 0xa8d51000 0xffffa52480853000 0x1000 0x0 0 MAP 432.294109 11 0xa8d3c000 0xffffa52480855000 0x1000 0x0 0 MAP 432.294234 12 0xa8d3b000 0xffffa52480883000 0x1000 0x0 0 MAP 432.294345 13 0xa8d3b000 0xffffa52480885000 0x1000 0x0 0 MAP 432.294424 14 0xa8d3b000 0xffffa5248091f000 0x1000 0x0 0 MAP 432.294753 15 0xec009000 0xffffa52480921000 0x1000 0x0 0 MAP 432.294761 16 0xec009000 0xffffa52480953000 0x1000 0x0 0 MAP 432.294766 17 0xec009000 0xffffa52480955000 0x1000 0x0 0 MAP 432.294772 18 0xec009000 0xffffa52480a8b000 0x1000 0x0 0 MAP 432.294777 19 0xec009000 0xffffa52480a8d000 0x1000 0x0 0 MAP 432.294782 20 0xec009000 0xffffa52480abf000 0x1000 0x0 0 MAP 432.294787 21 0xec009000 0xffffa52480ac1000 0x1000 0x0 0 MAP 432.294792 22 0xec009000 0xffffa52480b27000 0x1000 0x0 0 MAP 432.294797 23 0xec009000 0xffffa52480b29000 0x1000 0x0 0 MAP 432.294808 24 0xec009000 0xffffa52480b5b000 0x1000 0x0 0 MAP 432.294812 25 0xec009000 0xffffa52480b5d000 0x1000 0x0 0 MAP 432.294817 26 0xec009000 0xffffa52480b8f000 0x1000 0x0 0 MAP 432.294821 27 0xec009000 0xffffa52480b91000 0x1000 0x0 0 MAP 432.294827 28 0xec009000 0xffffa52480bf7000 0x1000 0x0 0 MAP 432.294831 29 0xec009000 0xffffa52480bf9000 0x1000 0x0 0 MAP 432.294837 30 0xec009000 0xffffa52480c5f000 0x1000 0x0 0 MAP 432.294841 31 0xec009000 0xffffa52480c61000 0x1000 0x0 0 MAP 432.294846 32 0xec009000 0xffffa52480c93000 0x1000 0x0 0 MAP 432.294852 33 0xec100000 0xffffa52480c95000 0x1000 0x0 0 MAP 432.294855 34 0xec009000 0xffffa52480d7d000 0x1000 0x0 0 MAP 455.520807 36 0xfa000000 0xffffa52487000000 0x1000000 0x0 0 MAP 455.559982 37 0xec009000 0xffffa524801ae000 0x1000 0x0 0 MAP 455.560094 38 0xec009000 0xffffa52480793000 0x1000 0x0 0 MAP 455.560178 39 0xec009000 0xffffa52480795000 0x1000 0x0 0 MAP 455.560257 40 0xec009000 0xffffa52480823000 0x1000 0x0 0 MAP 455.560332 41 0xec009000 0xffffa52480825000 0x1000 0x0 0 MAP 455.560407 42 0xec009000 0xffffa52480853000 0x1000 0x0 0 MAP 455.560488 43 0xec009000 0xffffa52480855000 0x1000 0x0 0 MAP 455.560563 44 0xec009000 0xffffa52480883000 0x1000 0x0 0 MAP 455.560637 45 0xec009000 0xffffa52480885000 0x1000 0x0 0 MAP 455.560720 46 0xec009000 0xffffa5248091f000 0x1000 0x0 0 MAP 455.560796 47 0xec009000 0xffffa52480921000 0x1000 0x0 0 MAP 455.560874 48 0xec009000 0xffffa52480953000 0x1000 0x0 0 MAP 455.560945 49 0xec009000 0xffffa52480955000 0x1000 0x0 0 MAP 455.561021 50 0xec009000 0xffffa52480a8b000 0x1000 0x0 0 MAP 455.561094 51 0xec009000 0xffffa52480a8d000 0x1000 0x0 0 MAP 455.561168 52 0xec009000 0xffffa52480abf000 0x1000 0x0 0 MAP 455.561240 53 0xec009000 0xffffa52480ac1000 0x1000 0x0 0 MAP 455.561313 54 0xec009000 0xffffa52480b27000 0x1000 0x0 0 MAP 455.561388 55 0xec100000 0xffffa52480b29000 0x1000 0x0 0 MAP 455.561432 56 0xec009000 0xffffa52480b5b000 0x1000 0x0 0 MAP 459.609561 57 0xfa000000 0xffffa52489000000 0x1000000 0x0 0 MAP 459.611769 58 0xec009000 0xffffa524801ae000 0x1000 0x0 0 MAP 459.611785 59 0xec009000 0xffffa52480793000 0x1000 0x0 0 MAP 459.611791 60 0xec009000 0xffffa52480795000 0x1000 0x0 0 MAP 459.611797 61 0xec009000 0xffffa52480823000 0x1000 0x0 0 MAP 459.611801 62 0xec009000 0xffffa52480825000 0x1000 0x0 0 MAP 459.611806 63 0xec009000 0xffffa52480853000 0x1000 0x0 0 MAP 459.611811 64 0xec009000 0xffffa52480855000 0x1000 0x0 0 MAP 459.611816 65 0xec009000 0xffffa52480883000 0x1000 0x0 0 MAP 459.611820 66 0xec009000 0xffffa52480885000 0x1000 0x0 0 MAP 459.611832 67 0xec009000 0xffffa5248091f000 0x1000 0x0 0 MAP 459.611836 68 0xec009000 0xffffa52480921000 0x1000 0x0 0 MAP 459.611841 69 0xec009000 0xffffa52480953000 0x1000 0x0 0 MAP 459.611845 70 0xec009000 0xffffa52480955000 0x1000 0x0 0 MAP 459.611851 71 0xec009000 0xffffa52480a8b000 0x1000 0x0 0 MAP 459.611855 72 0xec009000 0xffffa52480a8d000 0x1000 0x0 0 MAP 459.611860 73 0xec009000 0xffffa52480abf000 0x1000 0x0 0 MAP 459.611864 74 0xec009000 0xffffa52480ac1000 0x1000 0x0 0 MAP 459.611869 75 0xec009000 0xffffa52480b27000 0x1000 0x0 0 MAP 459.611875 76 0xec100000 0xffffa52480b29000 0x1000 0x0 0 MAP 459.611877 77 0xec009000 0xffffa52480b5b000 0x1000 0x0 0 ^timestamp ^id ^physical ^virtual ^length ^pc ^pid ```

Here we can see that BAR0 is mapped five times (id 1, 2, 3, 36 and 57), but BAR3 is never mapped. Unfortunately it is the values inside BAR3 that I am interested in.

Documentation for MMIO-trace, including the log file format can be found here.

SebastiaanDirks commented 3 years ago

Hmm, I'll give it a try again then. I've also tried running the 3080 on Intel, but no luck there either. The PCI id was 0000:01:00.0 instead of 0000:c1:00.0, but it didn't change a thing. It might not be a PCI address issue afterall.

One odd thing I did notice was the GPU temperature and power usage being really high on both systems after applying the mod. The 3080 was about 60C after a while and sucked about 160W. This is definitely not normal and only happens using this script.

Screenshot:

KrutavShah commented 3 years ago

The 3080 is rather new, but I know there are implementations for the GA100 chip in driver 450 and 460, but I don't know if they have added GA102 yet, which the RTX A6000 and 3080 have. It might work in the future, though.

Speaking of 450, have you tried out the 450 driver, or is it no longer available for download from the Nvidia Enterprise portal?

SebastiaanDirks commented 3 years ago

The 460 driver supports both the RTX A6000 and the A40. The 450 doesn't, it only supports the A100. I've tried to install it but it would just give me an error (which is obvious I guess)

KrutavShah commented 3 years ago

The 3080 was about 60C after a while and sucked about 160W. This is definitely not normal and only happens using this script.

I've noticed slightly higher idle wattages on mine too, but only 33 watts which is not much since I believe this script was built mostly around Pascal with not much testing done on newer generations like Ampere.

Usually if you are seeing much higher wattages, temps, and fans, that likely means that the driver is unable to properly work with the graphics card. Notice that your GPU is sitting on P0 high-performance power state despite idling. This isn't supposed to happen on normal operation and could mean that the script is disallowing the driver to work as intended.

I'm no expert by any means, but I figure a modified version of the script focussed on Ampere's far greater memory space usage and other quirks of Ampere generation could be made either separately or part of the same script, just that it will only activate upon detection of an Ampere card PCI ID.

And one last thing, this is unrelated but @FIFARenderZ do you plan on purchasing a license for vGPU after your trial license expires for realtime usage? Or are you trying out this setup for tinkering purposes?

DualCoder commented 3 years ago

I have no idea why the GPU usage would be affected by the script. But the MMIO-trace is equally useful whether or not vgpu_unlock is used. So an MMIO-trace with an unmodified driver and Ampere GPU would be interesting.

SebastiaanDirks commented 3 years ago

Usually if you are seeing much higher wattages, temps, and fans, that likely means that the driver is unable to properly work with the graphics card. Notice that your GPU is sitting on P0 high-performance power state despite idling. This isn't supposed to happen on normal operation and could mean that the script is disallowing the driver to work as intended.

That would make sense, although I haven't seen any other power state mentioned in the nvidia-smi command.

And one last thing, this is unrelated but @FIFARenderZ do you plan on purchasing a license for vGPU after your trial license expires for realtime usage? Or are you trying out this setup for tinkering purposes?

For tinkering purposes right now. Maybe we're able to do much more later 😉

I have no idea why the GPU usage would be affected by the script. But the MMIO-trace is equally useful whether or not vgpu_unlock is used. So an MMIO-trace with an unmodified driver and Ampere GPU would be interesting.

For sure, which is what I was trying to do. It didn't work out for some reason and I'll give it another try tomorrow

SebastiaanDirks commented 3 years ago

Tried another round of MMIO-tracing with no success. The driver works normally when booted, but once I start tracing (and disabled & enabled the driver) it just spits out "No device found". Checked the logs and it again contained no info about BAR3. @DualCoder Did you do anything different from the guide? Because I'm kind of at a loss right now

PS: Maybe I'm doing something wrong, but every time I execute echo nop > /sys/kernel/debug/tracing/current_tracer it returns a Device or resource busy. I've followed both guides completely and tried it in both recovery mode and normal mode

olealgoritme commented 3 years ago

If anyone wants to join, https://discord.gg/mAz38ZBrjx

KrutavShah commented 3 years ago

I have created one, if anyone wants to join, https://t.me/gpuhacking

We usually use the EEVBLOG forum to discuss this but their data center went on fire. I joined your telegram but it would be nice if we all can have permission to add messages.

Does the VGPU unlock work with RTX 3090?

In theory it could, but based on @FIFARenderZ's experience with the RTX 3080, it may or may not work out. This script works with older generations like Pascal and Turing though. Also, the 3090 uses the GA102 which is the same as the 3080, so your chances of success are going to be about as high as everyone else with a 3080...

arki05 commented 3 years ago

Has been solved by dualcoder in dualcoder/vgpu_unlock@54d90cde

DualCoder / vgpu_unlock

Hardcoded PCI address range #6