Open HeyMeco opened 3 months ago
ubuntu-drivers devices
== /sys/devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0 ==
modalias : pci:v000010DEd00002204sv000010DEsd0000147Dbc03sc00i00
vendor : NVIDIA Corporation
model : GA102 [GeForce RTX 3090]
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-535-server-open - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-535 - distro non-free recommended
driver : nvidia-driver-550-server-open - distro non-free
driver : nvidia-driver-545-open - distro non-free
driver : nvidia-driver-545 - distro non-free
driver : nvidia-driver-550-open - third-party non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-550-server - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-550 - third-party non-free
driver : xserver-xorg-video-nouveau - distro free builtin
dmesg
dmesg | grep 0000:01:00.0
[ 2.819204] pci 0000:01:00.0: [10de:2204] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.819246] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 2.819280] pci 0000:01:00.0: BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
[ 2.819313] pci 0000:01:00.0: BAR 3 [mem 0x00000000-0x01ffffff 64bit pref]
[ 2.819334] pci 0000:01:00.0: BAR 5 [io 0x0000-0x007f]
[ 2.819355] pci 0000:01:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 2.819610] pci 0000:01:00.0: PME# supported from D0 D3hot
[ 2.819960] pci 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 2.820325] pci 0000:01:00.0: vgaarb: setting as boot VGA device
[ 2.820330] pci 0000:01:00.0: vgaarb: bridge control possible
[ 2.820334] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.831115] pci 0000:01:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can't assign; no space
[ 2.831120] pci 0000:01:00.0: BAR 1 [mem size 0x10000000 64bit pref]: failed to assign
[ 2.831126] pci 0000:01:00.0: BAR 3 [mem size 0x02000000 64bit pref]: can't assign; no space
[ 2.831131] pci 0000:01:00.0: BAR 3 [mem size 0x02000000 64bit pref]: failed to assign
[ 2.831136] pci 0000:01:00.0: BAR 0 [mem size 0x01000000]: can't assign; no space
[ 2.831141] pci 0000:01:00.0: BAR 0 [mem size 0x01000000]: failed to assign
[ 2.831145] pci 0000:01:00.0: ROM [mem size 0x00080000 pref]: can't assign; no space
[ 2.831150] pci 0000:01:00.0: ROM [mem size 0x00080000 pref]: failed to assign
[ 2.831165] pci 0000:01:00.0: BAR 5 [io 0x100000-0x10007f]: assigned
[ 2.833528] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
After a patch from @mariobalanica we got the memory addresses assigned but the driver isn't quite working yet. I do think thats fixable
dmesg | grep -i 0000:01:00.0
[ 2.808753] pci 0000:01:00.0: [10de:2204] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.808795] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 2.808829] pci 0000:01:00.0: BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
[ 2.808863] pci 0000:01:00.0: BAR 3 [mem 0x00000000-0x01ffffff 64bit pref]
[ 2.808884] pci 0000:01:00.0: BAR 5 [io 0x0000-0x007f]
[ 2.808904] pci 0000:01:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 2.809169] pci 0000:01:00.0: PME# supported from D0 D3hot
[ 2.809520] pci 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 2.809868] pci 0000:01:00.0: vgaarb: setting as boot VGA device
[ 2.809873] pci 0000:01:00.0: vgaarb: bridge control possible
[ 2.809877] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.817735] pci 0000:01:00.0: BAR 1 [mem 0x900000000-0x90fffffff 64bit pref]: assigned
[ 2.817763] pci 0000:01:00.0: BAR 3 [mem 0x910000000-0x911ffffff 64bit pref]: assigned
[ 2.817791] pci 0000:01:00.0: BAR 0 [mem 0x918000000-0x918ffffff]: assigned
[ 2.817803] pci 0000:01:00.0: ROM [mem 0x919000000-0x91907ffff pref]: assigned
[ 2.817821] pci 0000:01:00.0: BAR 5 [io 0x100000-0x10007f]: assigned
[ 2.820089] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
[ 6.207738] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[ 6.207790] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 6.322937] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[ 18.164815] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1589)
[ 18.164992] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 22.491615] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1589)
[ 22.491797] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 33.703174] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 33.703413] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 34.034308] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 34.034559] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 34.725057] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 34.725143] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 35.042806] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 35.043036] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 35.473607] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 35.473766] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 35.805467] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 35.805559] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 36.358124] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 36.358360] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 36.680259] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 36.680533] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 37.183943] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 37.184094] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 37.503979] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 37.504145] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 92.174852] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1589)
[ 92.175095] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GA102 [GeForce RTX 3090]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 108
Region 0: Memory at 918000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 900000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 910000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 100000 [size=128]
Expansion ROM at 919000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fe670040 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (downgraded), Width x4 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [258 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=271360ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 256MB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
BAR 3: current size: 32MB, supported: 32MB
Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?>
Capabilities: [d00 v1] Lane Margining at the Receiver <?>
Capabilities: [e00 v1] Data Link Feature <?>
Kernel driver in use: nouveau
Kernel modules: nouveau
[ 2.931919] pci 0000:01:00.0: [10de:2204] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.931961] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 2.931996] pci 0000:01:00.0: BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
[ 2.932030] pci 0000:01:00.0: BAR 3 [mem 0x00000000-0x01ffffff 64bit pref]
[ 2.932050] pci 0000:01:00.0: BAR 5 [io 0x0000-0x007f]
[ 2.932071] pci 0000:01:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 2.932328] pci 0000:01:00.0: PME# supported from D0 D3hot
[ 2.932681] pci 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 2.933039] pci 0000:01:00.0: vgaarb: setting as boot VGA device
[ 2.933044] pci 0000:01:00.0: vgaarb: bridge control possible
[ 2.933048] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.940877] pci 0000:01:00.0: BAR 1 [mem 0x900000000-0x90fffffff 64bit pref]: assigned
[ 2.940906] pci 0000:01:00.0: BAR 3 [mem 0x910000000-0x911ffffff 64bit pref]: assigned
[ 2.940934] pci 0000:01:00.0: BAR 0 [mem 0x918000000-0x918ffffff]: assigned
[ 2.940947] pci 0000:01:00.0: ROM [mem 0x919000000-0x91907ffff pref]: assigned
[ 2.940964] pci 0000:01:00.0: BAR 5 [io 0x100000-0x10007f]: assigned
[ 2.955213] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
[ 2.955312] nouveau 0000:01:00.0: NVIDIA GA102 (b72000a1)
[ 3.317492] nouveau 0000:01:00.0: bios: version 94.02.4b.00.0b
[ 3.606146] nouveau 0000:01:00.0: bios: M0203E type 0a
[ 3.606160] nouveau 0000:01:00.0: fb: 24576 MiB of unknown memory type
[ 4.468637] nouveau 0000:01:00.0: DRM: VRAM: 24576 MiB
[ 4.468672] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
[ 4.468680] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[ 4.468686] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[ 4.468690] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[ 4.469967] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[ 4.473861] [drm] Initialized nouveau 1.4.0 20120801 for 0000:01:00.0 on minor 1
[ 6.610314] nouveau 0000:01:00.0: DRM: core notifier timeout
[ 8.611900] nouveau 0000:01:00.0: DRM: core notifier timeout
[ 10.612000] nouveau 0000:01:00.0: DRM: wndw-0: timeout
[ 10.621007] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
[ 11.433062] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops nouveau_drm_exit [nouveau])
[ 99.239911] nouveau 0000:01:00.0: Xwayland[1902]: failed to idle channel 2 [Xwayland[1902]]
We're now as far as the Raspberry Pi Community:
[ 8.600582] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1468)
[ 8.600674] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 8.600817] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 8.601012] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Full: dmesg | grep -i nv
nvidia: loading out-of-tree module taints kernel.
[ 3.656114] nvidia: module license 'NVIDIA' taints kernel.
[ 3.656125] nvidia: module license taints kernel.
[ 3.686823] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 3.689006] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[ 3.689047] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 4.034784] NVRM: loading NVIDIA UNIX aarch64 Kernel Module 535.161.07 Sat Feb 17 23:29:15 UTC 2024
[ 4.044831] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.161.07 Sat Feb 17 22:42:09 UTC 2024
[ 4.046235] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 4.055331] NVRM: Chipset not recognized (vendor ID 0x1d87, device ID 0x3588)
[ 4.055338] The NVIDIA GPU driver for AArch64 has not been qualified on this platform
environment.
[ 6.195981] input: HDA NVidia HDMI/DP,pcm=3 as /devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.1/sound/card1/input6
[ 6.196066] input: HDA NVidia HDMI/DP,pcm=7 as /devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.1/sound/card1/input7
[ 6.196117] input: HDA NVidia HDMI/DP,pcm=8 as /devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.1/sound/card1/input8
[ 6.196172] input: HDA NVidia HDMI/DP,pcm=9 as /devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.1/sound/card1/input9
[ 8.600582] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1468)
[ 8.600674] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 8.600817] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 8.601012] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[ 11.114974] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 11.120344] nvidia-uvm: Loaded the UVM driver, major device number 511.
@HeyMeco If you will able to run nvidia driver properly, can you check cuda - for example pytorch ngc container from nvidia to work with ML/AI? It will really cool combination I thing.
@HeyMeco I already have question, does CUDA works in your current setup without DRM?
@serhii-nakon
@HeyMeco I already have question, does CUDA works in your current setup without DRM?
It doesn't.
Current Status
With Kernel:
6.8.2-edge-rockchip-rk3588
From Image:Armbian_community_24.5.0-trunk.306_Rock-5b_jammy_edge_6.8.2_gnome_desktop
Issues that need to be resolved:
Here are some of the first findings:
lspci -vvvv
iomem