Shidell / OpenCaldera

An open project looking to expand the (Dell) Alienware Graphics Amplifier to support Nvidia RTX 3000 and AMD RX 6000 series GPUs.
8 stars 2 forks source link

[Speculation] 30-series compatibility with NVidia Optimus #4

Open Riebart opened 3 years ago

Riebart commented 3 years ago

I made a post on the Alienware subreddit, but my speculation right now is that the compatibility issue is related to NVidia Optimus.

I don't have any of the new cards myself either, so I too am hoping some folks can contribute some information.

Dropping this here as an issue as it is related to the effort.

Shidell commented 3 years ago

Thanks for the head's up, and it's an interesting speculation—the difficulty I've seen so far is that when attaching a 3000 series via the AGA, it'll enumerate as another device in the Device Tree, without disabling the dGPU, for example.

If I'm understanding your hypothesis correctly, you're expecting attaching a properly-supported (e.g. Optimus-supported via Drivers) 3000 series should actually disable the dGPU and remove it from the Device Tree (whilst attached?)

Riebart commented 3 years ago

Originally, that was my expectation because it matched my experiences with my 15 R3 with 6820HK, 1070, and an EVGA 1070 in the AGA. Specifically, my guess is that the drivers are failing to load because of Optimus/AMD XConnect (which would be for the R6000 series) issues..

Now, however, I have some new information that makes me question some of this. I scraped a bunch more data from 3DMark's results related to secondary GPUs attached to Alienware laptops (I filter based on Alienware motherboards, and mobile Intel CPUs), and it looks really inconsistent.

Specifically, I looked for Alienware laptops with a 2080 Ti (which is guaranteed to be an eGPU), and most of the results have the internal dGPU visible, but not all; about 75% of the results have both the internal dGPU and iGPU visible, and 25% have only one of the internal GPUs listed, which in all cases was the internal iGPU (except for 3 [weird](https://www.3dmark.com/spy/4655586] corner cases).

There could be a few reasons for this, but I need more data. My GTX1070 is on loan to a friend, and I don't have any other GPUs in the house to test in the AGA. Hopefully I can collect one this weekend and test more.

Shidell commented 3 years ago

Yeah, this seems to mimic the results I've seen—primarily that any GPU loaded in the AGA is just another device in the Device Tree.

I do have some info that suggests it could be Nvidia's drivers to blame, and I'm working on aggregating that into an update for the main page of the project.

I really wish I could get my hands on an RX 6000 series to test with, because if they work, then I'd point squarely at Nvidia as the culprit.

Riebart commented 3 years ago

Yeah. I'm going to grab a friend's 1050Ti to test with.

The 15R3 I have has a mux switch, which lets you completely disable the iGPU. This will let me test to see how reproducible I can make the device configurations, by design, as opposed to hoping the AGA figures it out on its own.

No iGPU was the device configuration that was visible in that one Reddit post of the guy getting the 3070 working (he has a 2080 Super Max Q and the 3070, no iGPU on the device tree).

Shidell commented 3 years ago

Awesome, the mux switch is an interesting idea and I can't test that on my m15.

Any chance you're near Milwaukee?

Riebart commented 3 years ago

Nope, Canada. Sorry. 😁

I'll let you know how it goes in the next 24 hours or so.

Shidell commented 3 years ago

Haha, sounds good.

Riebart commented 3 years ago

For what it's worth, I pulled the same dataset for GTX 1070 attached to Alienware laptops, and it's the same story: Almost everything lines up, except there's always a few outliers. I'll need a card to put into the AGA to start testing this (e.g. maybe the AGA disables whatever the mux switch does?)

Shidell commented 3 years ago

It's totally possible that the firmware/setup on those models (disabling the dGPU) has something to do with it.

I'm speculating that it's associated; the driver install is what fails, citing not enough resources, so if Nvidia is doing something under the hood in their drivers to handle Optimus solutions, that would make some sense.

I really need to source an RX 6000 series, though, because if those work just fine, then I blame Nvidia's drivers.

Riebart commented 3 years ago

This link about the Code 31 related to Razer eGPUs has some good stuff.

Looks like on their case they couldn't hot plug the TB3 enclosure would the Razer RGB programs were running. That is, the RGB controller was flaking out but the issue manifested as the GPU not coming online.

Shidell commented 3 years ago

I was able to purchase an RX 6800, and it's working fine in the AGA. I made a big update to the pinned post on the Alienware subreddit with my thoughts, just wanted to post here so you get an update.

https://www.reddit.com/r/Alienware/comments/jxvnga/project_opencaldera_exploring_adding_unofficial/

Riebart commented 3 years ago

This is awesome news! Thanks for the update. 👍

Shidell commented 3 years ago

Sure thing. :)

Riebart commented 3 years ago

So, for what it's worth, I was able to test my theories on how the intel graphics works with the AGA. Note that I don't have a GSync display (or the 300Hz display on the new R3 models), which are different again, but I've got a mux switch.

Annotation 2020-11-29 224822

So, not sure what's going on, but documenting it ehre regardless.

Shidell commented 3 years ago

I think it'll be interesting to see how this works with an RTX 3000 series, as I don't have a MUX switch and can't test if a change frees/changes how Nvidia's driver install views resources and would allow it to complete as expected right now.

Riebart commented 3 years ago

One last data point, I talked to that guy that had the working 3070, and he has the 300Hz display, which means no iGPU, and therefore no Optimus.

I'm pretty confident now that the issue is Optimus support, with a good chance that once we start to see mobile 30-series parts get closer to launch, we'll see a new driver (probably both Intel and NVidia) update that fixes this issue.

Shidell commented 3 years ago

I do believe Nvidia's drivers are most likely the fault, but whether or not it's Optimus-related (or other) I can't say for sure. In the Alienware post on reddit, I did cite some examples of Nvidia having issues with Alienware/AGA previously, which they later corrected.

Regardless of where the fault likes (Nvidia, Alienware, or both), I can't imagine it being a difficult issue to correct for.