ArchangeGabriel commented 11 years ago

Some logs from our discussion about this: https://bumblebee.etherpad.mozilla.org/8

Work is on common-primus branch.

I will list the new features soon.

Work remaining:

revert my last commit on the three
fix issues in primus

Some questions: 1) Based on this https://github.com/Bumblebee-Project/Bumblebee/commit/7fb2a8250aeca5fae805c185152fdf3c10875d2f, this https://github.com/Bumblebee-Project/bumblebee-ppa/blob/primus-proposed/precise/primus/debian/patches/ubuntu_libgl_path.patch and this https://github.com/amonakov/primus/blob/master/primusrun, how should we package primus ? Sould we hardcode some default path at compile time like it is done right now, plus eventually some thing in the primus script, or move everything to daemon side (I think that even the primusrun script is useless in this situation, since everything is set by the daemon, that can even make the final call, right) ? I personaly prefer the second option.

2) How is currently managed the fact that primus only start X on a GL call ?

ArchangeGabriel commented 11 years ago

See this related issue too: #241

Lekensteyn commented 11 years ago

Hmm, isn't this already done with 3.1? Regarding (2), when libGL.so is loaded, primus is activated and then asks the daemon to start X.

ArchangeGabriel commented 11 years ago

Indeed, most work is done, this is just a question of inversing default and packaging primus. We don't need to install primusrun right, just libs?

amonakov commented 11 years ago

At the moment primusrun is needed, but changing optirun to not need it is easy.

Lekensteyn commented 11 years ago

There is still a difference between optirun and primusrun: primus starts X only when OpenGL is actually needed (i.e. when libGL.so is loaded). optirun OTOH starts X and if that fails to run, it will also not start the program.

(optirun doesn't need primusrun indeed)

ArchangeGabriel commented 11 years ago

@Lekensteyn: That were the kind of things I was asking above. So currently, we rely on primus package too, but that could be changed (and IMO, should). However, optirun does not behave like primus in this case. We need to decide wether this is a feature or not.

For some program like firefox, I think it's a good thing for power usage that the server only shows up when running an accelerated content.

However, is this a problem that the program start if the server may not be started when needed because of a configuration problem or something else? Could we make it crash at that time?

Lekensteyn commented 11 years ago

I'm fine with staying dependent on primus. The primus package itself is architecture-independent and can depend on both primus-libs-ia32 and primus-libs-amd64 (or whatever it is called).

primus starts X when the libGL is loaded. I have just tried primusrun firefox and it looks like Firefox loads libGL.so immediately. When the daemon is not available, Firefox does not start. I guess that answers your question?

amonakov commented 11 years ago

I have just tried primusrun firefox and it looks like Firefox loads libGL.so immediately. When the daemon is not available, Firefox does not start.

That's not what I'm seeing. Here, firefox does start (and works!) if the daemon is not available, but the WebGL support is hosed (obviously).

Also, it appears that it loads libGL once at startup in a separate process to probe OpenGL capabilities, so that secondary X is started and then shut down immediately.

I think we can improve the current approach by LD_PRELOAD'ing a library that will poke the daemon immediately at startup, terminate if it's not available, and retrieve configuration values otherwise (but not start the secondary X server yet). Then primus can use that library to get configuration values and start the secondary server.

Such library could also be used in hybrid-screenclone.

ArchangeGabriel commented 11 years ago

Is all this going to be solved with #363?

Lekensteyn commented 11 years ago

Heh, I forgot to reply after my machine locked up. lockdep+nvidia does not play well...

When primus is loaded, it exits the program right away when the daemon is unavailable.

connect: No such file or directory
primus: fatal: failure contacting Bumblebee daemon

When the daemon is available, but the display could not be opened (or if the Xserver did not start), it also exits.

ArchangeGabriel commented 11 years ago

Ok, so about this, #363, and primus packaging, this is what could be done:

make primusrun retrieve settings from the daemon and only package it once, but still package it for reasons @amonakov mentionned in #363.
have optirun being able to work without primusrun

For the last point, we may have two options:

we can make it work just like primus does, so only turn the card on when needed and save power.
or we can let it as it is now, for people who don't want to have X stopping/starting each time they need it while using a particular software, and provide a third "bridge", primusrun, that use the so-named script.

In the first one, they are still some changes to do before merging #363, in the second case it's just about adding a third bridge and packaging.

RalfJung commented 11 years ago

IMHO adding this "lazy" start to optirun is independent of the pull request. Current behaviour is to always start the 2nd X when using optirun (with any bridge), and this isn't changed by my patch. Of course, I could add an additional change implementing, for example, a "primus-lazy" bridge which shares the detection function, and performs the X startup before calling run_primus - but that's a new feature independent from no longer depending on primusrun.

RalfJung commented 11 years ago

Uhm, the other way around of course... primus-lazy does not start X. Something like this: https://github.com/ralfjung-e/Bumblebee/tree/primus-lazy Of course, this would need updating the documentation, it's just a draft.

ArchangeGabriel commented 11 years ago

Indeed, #363 could be merged, since it does not change anything to loading X server behavior.

For primus-lazy, @amonakov suggested a clean approach to that thing:

I think we can improve the current approach by LD_PRELOAD'ing a library that will poke the daemon immediately at startup, terminate if it's not available, and retrieve configuration values otherwise (but not start the secondary X server yet). Then primus can use that library to get configuration values and start the secondary server.

Else we may just use primusrun while not loading X directly in bumblebee.

Lekensteyn commented 10 years ago

As discussed yesterday with @Vincent-C I would like to make a release soon. Outstanding commits:

The following commits can safely be picked:

2073f85 Fix devices with a bus larger than 9 (GH-573)
25387e9 Ignore error on X shutdown

The following changes the default configuration and needs a special release note:

5acbb38 Make primus the default again on develop.

This one needs an additional udev rule:

1ada79f module: use "modprobe -r" instead of rmmod: GH-565

These are trivial changes:

327ddfc Suggest "AllowEmptyInitialConfiguration" (#373)
8244297 scripts/systemd: remove scheduling policy (closes GH-445)

Overall, these are not big changes. The most noticable is probably defaulting to primus, followed by modprobe -r. I propose version number 3.3. Any other outstanding issues? Documentation is severely out of date too.

ArchangeGabriel commented 10 years ago

Sorry for having been away so long…

I have at least two other minor changes I never took the time to commit (been very busy, and even changed my computer for a Dell and distro to Arch during that time), the first one is for systemd, issue #464, and the second one is a little more risky: change in xorg.conf file.

I cannot find again the issue where it was proposed (so that I can’t find the exact reasons of this, but remember that it was supposed to fix an issue and is a bit cleaner more generally), but we could consider changing “AutoAddDevice "false"” to the following:

Section "InputClass"
    Identifier  "IgnoreDevices"
    MatchDevicePath "/dev/input/event*|/dev/input/mouse*|/dev/input/js*|/dev/input/mice"
    Option      "Ignore" "true"
EndSection

This can either be put in both xorg.conf files, or replace the 10-dummy.conf file (which would be better, but need to be widely tested before, since I don’t want to face the same issue we fixed in 3.2.1).

Also, I would propose renaming of xorg.conf.n* to n*.xorg.conf, so that editors and similar things properly detect them as configuration files (which is not the case currently, for example when editing with vim).

ArchangeGabriel commented 10 years ago

And indeed, documentation all over there (Ubuntu/Debian and probably even Arch wiki, GitHub wiki, docs file) must be updated, will try to take a look at that.

ArchangeGabriel commented 10 years ago

We may also look at the need of a screen section in nvidia xorg.conf file, see here: https://github.com/Bumblebee-Project/Bumblebee/issues/580#issuecomment-44201413

ArchangeGabriel commented 10 years ago

Is #604 going to be fixed by the modprobe changes?

Lekensteyn commented 10 years ago

The xorg conf snippet would need some testing to avoid some headache, for a quick release I would skip it unless it fixes a real issue.

For the xorg.conf file, the filetype is not conf, but xf86conf. Only files such as xorg.conf and */xorg.conf.d/*.conf are known as such. You could add this to your vimrc:

au BufNewFile,BufRead /etc/bumblebee/xorg.conf.* setf xf86conf

The system config dir should not be loaded if a custom conf dir is given, this is at least the behavior on Arch.

604 (and possibly others) should indeed be fixed by the modprobe change.

amonakov commented 10 years ago

How is modprobe change going to help with #604?

I suggest not rushing the modprobe change into a release if there's confusion and lack of testing and discussion. The other changes look far safer.

Vincent-C commented 10 years ago

Do we know if the suggested xorg.conf snippet in #580 actually fixes the problem reported in that bug? I recall seeing that issue discussed on IRC a few times with some users reporting that it didn't work for them.

ArchangeGabriel commented 9 years ago

I’ve started cleaning and sorting again the issue tracker, hope to finish that today. Before releasing, I would like my above xorg.conf change to be tested (will ask for that on the french forum), since we don’t seem to be in a fast release emergency anymore.

Also, I would like to update documentation, I’m going to use this thread to add note about things to do.

To be added in documentation:

ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140724/nsarguments-95)

ACPI Error: Field [TMPB] at 282624 exceeds Buffer [ROM1] size 262144 (bits) (20140724/dsopcode-236)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.PEG0.PEGP._ROM] (Node ffff88041f07db40), AE_AML_BUFFER_LIMIT (20140724/psparse-536)

ArchangeGabriel commented 9 years ago

I’ve been looking at setting "AllowEmptyInitialConfiguration" by default but it makes NVIDIA to detect the CRT/DFP link when one exist again, and I remember that with switched fully to "UseDisplayDevice" "none" because it makes the secondary X server to start a bit faster and configuring display in this case is useless. So we should indeed let this as it is currently and add documentation on online support about this.

I’m still waiting for more returns on the xorg.conf ignore device change, but didn’t get any negative ones as of today.

ArchangeGabriel commented 9 years ago

OK, so except for documentation updates, the only thing I’m not sure about is nvidia-uvm handling. I need to recheck what we planned to do and verify it will work everywhere.

ArchangeGabriel commented 8 years ago

We have a bit more issues to look into before releasing (all tagged with Milestone 4.0). I will open a pull request with the xorg.conf snippet and ask user to try it. I’m opening an issue about nvidia modules handling.

ArchangeGabriel commented 8 years ago

Do we want to keep the bumblebee-bugreport tool? We need at least to check which part of it are still relevant.

What about glsanity? Should we suggest distros to package it separately? Or do we just link it in bug reporting docs?

Lekensteyn commented 8 years ago

bumblebee-bugreport has much Debian/Ubuntu-specific things like update-alternatives, but the library path configuration (broken or not) is still universa (maybe glsanity checks that too, I have not looked at it).

With modern machines there is an issue with bbswitch that I am trying to address in https://github.com/Bumblebee-Project/bbswitch/tree/acpi-pr3 (basically, Windows 8 and newer decided to disable/enable power resources on the parent PCIe device, so support for the _DSM calls might become unreliable). A fix for that that is forward compatible with kernel 4.7 (in development) and newer requires that bbswitch acts as a PCI driver such that it can use runtime PM, but this approach needs some changes to bumblebeed for reliablity:

If the PMMethod is bbswitch and the loaded driver is "bbswitch", then assume it is turned off (and do not try to unload the bbswitch module!)
Instead of using the /proc/acpi/bbswitch interface, use the unbind/bind method described in https://github.com/Bumblebee-Project/bbswitch/commit/daa6411911426246aef804ec5ad8a44a07a15a66. This part is something I am not sure of. The problem with using the /proc/acpi/bbswitch interface is that it cannot really probe one specific device. What can be done is to trigger the probe for a device, but if other drivers are also available, they might bind with the device instead of bbswitch. With the unbind/new_id/unbind approach (only stable as userspace interface), the outcome is more deterministic. Hmm, while bumblebeed is used, there should be no other driver loaded when bbswitch is asked to turn it OFF, so maybe this potential issue can be ignored.

Do you think it is worth holding off BB 4.0 until bbswitch is fixed for the newer machines? Should solve issues like "memory corruption on Lenovos" and "battery dies quickly in suspend".

ArchangeGabriel commented 8 years ago

Yes, definitively worth. I thought at one point Bumblebee 4.0 should be a companion of bbswitch 1.0, but wasn’t sure where you were on that one. But since you’ve asked, yes, we should delay BB 4.0 until bb 1.0 is ready too. ;) Anyway, I still have a lot of documentation update to go for BB 4.0, so that should give you some time to get things done. :)

bluca commented 8 years ago

We backport the good new stuff in Debian and in the Ubuntu PPA anyway, so users have a way to get new bug fixes and features until this is finished too.

ArchangeGabriel commented 8 years ago

So, we decided to go without waiting bbswitch 1.0 finally. We still agree on removing nouveau support, right? This will need a point in the release note (like for people wanting to switch dynamically to nouveau, stop bumblebeed daemon and modprobe -r bbswitch, then modprobe nouveau and DRI_PRIME=1 <program> — which require DRI3, but I’ll suppose any system with bumblebee 4.0 will be DRI3 anyway — instead of optirun + instructions for switching back to bumblebee mode).

I’ll go through the open issues and task ASAP, and I propose a tentative release date as next sunday (30th October).

Lekensteyn commented 8 years ago

Yes, drop nouveau support from Bumblebee and write/point to some documentation on using nouveau properly in an Optimus setup (nouveau wiki?).

FYI, I will be less available in the next 2-3 weeks due to exams and delayed project work ;)

ArchangeGabriel commented 8 years ago

Sadly, I had (and still have) to postpone this on my side too, I’ve been overbusy+seek for the past 10 days, and I’ve just seen I won’t have any free time available before at least december…

Lekensteyn commented 8 years ago

Ok, no problem, maybe I can do some additional QA now that I can test the nvidia blob again. Still in exam period, but it is an idea for the moments thereafter :)

Bumblebee-Project / Bumblebee

Primus support & Bumblebee 4.0 #319

604 (and possibly others) should indeed be fixed by the modprobe change.