bayasdev / envycontrol

Easy GPU switching for Nvidia Optimus laptops under Linux
https://bayas.dev/envycontrol
MIT License
1.18k stars 60 forks source link

Is switching from integrated to Nvidia feasible? #79

Closed adhamsalama closed 5 months ago

adhamsalama commented 1 year ago

First of all, THANK YOU!

I am using Fedora and couldn't find anything on Fedora that can switch the GPU so I would have jumped to another distro if this package didn't exist, so thank you again!

I am opening this issue only to ask if it's feasible to support switching from integrated to Nvidia directly, or is that simply not possible at all?

bayasdev commented 1 year ago

Yes, it's possible to change directly from integrated to Nvidia [1], however I changed the way we generate the Xorg config to include the PCI bus ID of the GPU due to compatibility problems on certain systems (AMD if I reckon).

With this approach we can't switch directly since the GPU is no longer on the PCI bus.

[1] https://docs.fedoraproject.org/en-US/quick-docs/how-to-set-nvidia-as-primary-gpu-on-optimus-based-laptops/

adhamsalama commented 1 year ago

Alright. Thank you for the answer, and for your work! :heart:

klmcwhirter commented 5 months ago

@bayasdev I have an idea to cache the values to make this possible.

It will assume that the machine is in Optimus Hybrid mode the first time it is run with --switch - i.e., the machine is in a state such that lspci lists both IDs.

I have a design page draft started that will eventually need review: 79_Design.

I hope I have captured the important parts of our discussion from #146.

I'll wait for review before proceeding.

As I mentioned, I will get the dev env setup and dive deeper tomorrow (2023/02/22).

Thanks for the help.

bayasdev commented 5 months ago

@bayasdev I have an idea to cache the values to make this possible.

It will assume that the machine is in Optimus Hybrid mode the first time it is run with --switch - i.e., the machine is in a state such that lspci lists both IDs.

I have a design page draft started that will eventually need review: 79_Design.

I hope I have captured the important parts of our discussion from #146.

I'll wait for review before proceeding.

As I mentioned, I will get the dev env setup and dive deeper tomorrow (2023/02/22).

Thanks for the help.

Thanks for your interest, I really appreciate it!

Currently at line 279 we have:

nvidia_gpu_pci_bus = get_nvidia_gpu_pci_bus()

If we already have the cached Nvidia GPU bus ID we can use that, otherwise we should always gracefully fallback (if it doesn't exist nor the ID is not valid) to our current strategy which is dynamically detecting it from the lspci output.

I think caching other values aside from the Nvidia GPU bus ID is kinda pointless since those can always be retrieved during runtime.

Using the path /var/cache/envycontrol/ for cache file seems reasonable as switching between graphics mode requires root permissions to succeed.

Also the cache should always be generated when switching modes so users have an easy way to regenerate their cache if they manually edit it and mess things up.

Don't get pressured and keep things simple ;)

bayasdev commented 5 months ago

Following up my previous comment you can just reuse the get_nvidia_gpu_pci_bus() function and save its output to a plain text file when switching between modes (if Nvidia GPU was not accessible at that time don't create or overwrite the already existing cache).

Then if switching to Nvidia just validate its content (maybe using a Regex).

I hope this makes sense.

klmcwhirter commented 5 months ago

No pressure. I am thinking about the same things.

Just trying to do a thorough job that does not generate support issues. Every caching strategy I have employed over the years was dicey.

Whatever we decide I should have it done tomorrow. I would prefer to be thorough rather than easy.

Let's iterate over the PR. Looking forward to it.

Thanks for the input!

BTW, I didn't mention this before because I do not like calling it out, but I am an accomplished Python developer and have decades of experience. Trust me, I got this.

I will look to you to help trim the scope. After all, it is your project.

klmcwhirter commented 5 months ago

@bayasdev I tried to send you an email via the address on your website but it bounced from yahoo.com. FYI

I have the impl done - klmcwhirter/envycontrol/#tests - take a look at the tests branch.

I still want to do some more testing though.

I'll squash the commits before merging to main so the PR will have just a single commit. FYI

Also, please see the updated design page 79_Design.

Especially please comment on the items in the section 'Current questions ...'.

It still needs work. I'll try to finish the updates today. I did update the cache file sample with real app generated data though.

Your feedback is welcome.

Is version 3.3.2 what I should be targeting? Or do you have something else in mind? For example, do you have versioning automated in your CI/CD pipeline?

bayasdev commented 5 months ago

@bayasdev I tried to send you an email via the address on your website but it bounced from yahoo.com. FYI

I have the impl done - klmcwhirter/envycontrol/#tests - take a look at the tests branch.

I still want to do some more testing though.

I'll squash the commits before merging to main so the PR will have just a single commit. FYI

Also, please see the updated design page 79_Design.

Especially please comment on the items in the section 'Current questions ...'.

It still needs work. I'll try to finish the updates today. I did update the cache file sample with real app generated data though.

Your feedback is welcome.

Is version 3.3.2 what I should be targeting? Or do you have something else in mind? For example, do you have versioning automated in your CI/CD pipeline?

Those are some big changes and seem out of the scope for this issue (you even included a whole test suite).

@klmcwhirter do you mind opening a new issue for each of your proposed enhancements?

klmcwhirter commented 5 months ago

@bayasdev there really is only one enhancement = the CachedConfig implementation and its usage.

And that is all that changed.

The 3 new command line options are for stability / troubleshooting. I would not be comfortable excluding them. At some point someone (probably me) is going to need them.

I was not going to merge the tests in. They are very incomplete; just enough integration tests for me to verify what I am doing.

I am happy excluding the rebuild_initramfs() call durinig --reset. If you see it the same way then it really is a defect that needs its own issue. So I understand that one.

I have created a temporary branch so you can see what you can expect in a PR as things stand right now. https://github.com/klmcwhirter/envycontrol/tree/sample_merge_contents

What exactly did you have in mind? Please guide me as to next steps.

Oh, and before I forget - THANK YOU for this project! It seems to have extended my battery life as I had hoped when in integrated mode.

BTW - this is where I learned about it: https://youtu.be/Pn2iUgW3l6w

bayasdev commented 5 months ago

@bayasdev there really is only one enhancement = the CachedConfig and its usage.

I don't get why do you think it's necessary to cache more parameters than the Nvidia GPU PCI bus ID in order to fix this very specific issue, which is having a way seamlessly transition from integrated to nvidia modes without rebooting into hybrid mode first.

All we need is having the bus ID stored somewhere else on disk.

The 3 new command line options are for stability / troubleshooting. I would not be comfortable excluding them. At some point someone is going to need them.

Fair enough though the idea is that the cached bus ID should be recreated on each run unless it's not possible to retrieve it.

The purpose of the "cache" in this use case is not to speed up the process (like a database query on a web application or a static asset which can be cached for X time to reduce load on the origin server) but to support seamless mode switching.

I was not going to merge the tests in. They are very incomplete' just enough integration tests for me to verify what I am doing.

Adding test is something I've been having on my mind at some point but we could add those later.

I am happy excluding the rebuild_initramfs() call durinig --reset. If you see it the same way then it really is a defect that needs its own issue. So I understand that one.

You're right, the reset operation should also rebuild the initramfs. This alone should be a single PR.

What exactly did you have in mind? Please guide me as to next steps.

Try to keep your changes atomic so it's easier for me to review them.

And again thanks for your help!

klmcwhirter commented 5 months ago

My changes are atomic. Point blank.

The metadata in the cache file has been invaluable to me. I don't see why you fight so hard to exclude them. It is a 'design for the future' approach that I have followed as an architect for several decades.

They pave the way for future enhancements like profile files that can be selected on the cmdline (once that support is added).

I saw at least one issue in your backlog that could use that.

There is a request to allow customization for lightdm config. This could be seen as a baby step in that direction.

But it is not a big deal. I just need to delete a handful of lines in CachedConfig.create_cache_obj.

But there is no downside to leaving it there.

How many times have you wished that info was there when an issue was reported?

You could just change your GitHub template to expect output from python -m envycontrol --cache-query.

I'll be happy to back out these lines if you are sure that is what you want. It just seems short-sided to me. But this will be my last plea for their inclusion. https://github.com/klmcwhirter/envycontrol/blob/sample_merge_contents/envycontrol.py#L624-L632

FYI - I'll be keeping them in my copy. If you change your mind ... they will be available either way.

It is up to you.

Did you have a recommendation for nvidia driver install?

If not I'll start with npmfusion. I have not seen / heard a preference from anyone.

After this project I hope to never deal with nvidia again. I got a chuckle out of your reference to the famous Linus Torvalds video in your README. Now I truly understand...

I would be happy to help with starting a test bed. Just keep in mind that testing produces refactoring as an output. That is a normal consequence of adding unit or integration tests.

Much of the current code is not easily tested. Some refactoring will be needed to wrestle it under test.

Also, a single file script is not going to be very maintainable once that starts. I see the she-bang at the top of the file. I understand the utility of a single file script. But is that a hard project goal? To keep a single file script as opposed to a full package that has things decoupled and organized in separate modules or even sub-packages if needed?

But if you are OK with pytest, I do have lots of experience with it. And would be happy to help.

Please create an enhancement request and assign it to me.

Initiatives like this are best done in phases. Please think about a list of priorities and share that list with me somehow. I'll help you refine it.

Thanks again.

bayasdev commented 5 months ago

Thanks, I'm testing your sample_merge_contents branch and the switch from integrated to nvidia works as expected.

image

Did you have a recommendation for nvidia driver install?

If not I'll start with npmfusion. I have not seen / heard a preference from anyone.

I recommend using the package provided by your distro to avoid problems when upgrading to a major version (eg: Ubuntu 22.04 to 24.04)

After this project I hope to never deal with nvidia again. I got a chuckle out of your reference to the famous Linus Torvalds video in your README. Now I truly understand...

Yeah, I can feel your pain hehe.

I would be happy to help with starting a test bed. Just keep in mind that testing produces refactoring as an output. That is a normal consequence of adding unit or integration tests.

Much of the current code is not easily tested. Some refactoring will be needed to wrestle it under test.

Also, a single file script is not going to be very maintainable once that starts. I see the she-bang at the top of the file. I understand the utility of a single file script. But is that a hard project goal? To keep a single file script as opposed to a full package that has things decoupled and organized in separate modules or even sub-packages if needed?

But if you are OK with pytest, I do have lots of experience with it. And would be happy to help.

Please create an enhancement request and assign it to me.

Initiatives like this are best done in phases. Please think about a list of priorities and share that list with me somehow. I'll help you refine it.

TBH I'm not really good at Python but it was a fun weekend project to make a simple script that allowed me to turn off and on my Nvidia GPU. I never expected EnvyControl to become this popular but thanks to the community it has evolved into a very handy tool for optimus laptop owners.

If you'd like to start a complete rewrite for version 4.0 of EnvyControl feel free to open an issue for tracking its progress I'm all ears.

Once we feel that it's ready we can launch a beta version so people can leave their feedback and detect issues before pushing it to main.

Thanks again @klmcwhirter!

Edit: Currently the project doesn't have a CI/CD pipeline so each time a new release happens I have to manually build the DEB and AUR packages (as well as waiting for the COPR maintainer to catch up).

klmcwhirter commented 5 months ago

What version # do you want me to use ? Or will you handle that after the PR merge?

Do you want to keep the metadata now that you have seen the result of using my my code?

I'll take this last input and put the PR together.

Just keep in mind that I am still testing. Don't merge it until you get a green light from me. Will be mid-late next week probably.

I have been using Python since it was first released in early 90s. I have done lot's of projects - some very large. My last enterprise-wide effort at the company from which I retired was to get Python approved as a principle language for teams that use less skilled offshore resources. It was widely successful. Python is a wonderful language for enabling people get work done.

Your weekend project is a perfect case study ;) Good for you!

bayasdev commented 5 months ago

What version # do you want me to use ? Or will you handle that after the PR merge?

Following SEMVER, if we keep the user facing API intact it could fit into 3.X.X but 4.0.0 seems more appropriate to indicate it's a major change of the underlying codebase.

Do you want to keep the metadata now that you have seen the result of using my my code?

Yes, it's really useful for troubleshooting. Cached data can get stale after user runs it with different parameters but if you figure out a better solution it's always welcome.

I'll take this last input and put the PR together.

Just keep in mind that I am still testing. Don't merge it until you get a green light from me. Will be mid-late next week probably.

Take all the time you need ;)

I have been using Python since it was first released in early 90s. I have done lot's of projects - some very large. My last enterprise-wide effort at the company from which I retired was to get Python approved as a principle language for teams that use less skilled offshore resources. It was widely successful. Python is a wonderful language for enabling people get work done.

Your weekend project is a perfect case study ;) Good for you!

Kudos 🙌🏼

klmcwhirter commented 5 months ago

Please give me a version #. I'll use it. To counter your comment - how about 3.4.0. It's not a major change in terms of user experience. And to your other point, 4.0.0 might best be reserved for a rewrite.

Thumbs up for the metadata. I'll keep it in. And if you notice the impl of "args" is dynamic. As new cmdline opts are added they automatically get included. Yeah, I thought about that - Opened / Closed principle. "Opened for extension, but closed for change".

I'll get the PR created so you can see a diff. Hopefully that will put your mind at ease. But as I mentioned, please don't merge it until I let you know. I am working on tests for the new cmdline options now. I have a lot more testing to do. I have not spent much time thinking about edge cases yet because I wanted to get a working POC to you as soon as I could.

Watch for the PR please. I'll put it together tomorrow.

klmcwhirter commented 5 months ago

PR Created but not marked fixed yet - #155

The README changes can be viewed in these 2 locations:

klmcwhirter commented 5 months ago

@adhamsalama FYI we are very close to a working solution for this issue if you still need it.

Not sure about the timing. But watch for version 3.4.0 soon!

adhamsalama commented 3 weeks ago

Thank you both for your work! 🙏🚀❤️