genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.08k stars 254 forks source link

Sculpt OS 24.04 #5174

Closed nfeske closed 6 months ago

nfeske commented 7 months ago

This issue is intended for changes on account of the forthcoming Sculpt OS version 24.04.

nfeske commented 7 months ago

Commit https://github.com/genodelabs/genode/commit/797d2d47d1fcd30906c2df796b11b9b3f06912c6 bumps the version to 24.04. This means that there will be no ABI changes until the release.

nfeske commented 7 months ago

Commit https://github.com/genodelabs/genode/commit/d17aa7d7219181836e4724f7c1482356d25351cd makes the driver selection a little bit more flexible. The PS/2 and Intel drivers can now explicitly be suppressed by using the sculpt-manager config attributes ps2="no" and intel_gpu="no". The latter is useful to let the system fall back to VESA or boot-fb.

alex-ab commented 7 months ago

Currently boot_fb is not working:


[init -> runtime]   provides service Block
[init -> runtime] child "runtime_view" requests resources: cap_quota=9
[init -> runtime] Warning: boot_fb: no route to service "ROM" (label="boot_fb -> platform_info")
[init -> runtime -> boot_fb] Error: ROM-session creation failed (label="platform_info", ram_quota=6K, cap_quota=3, )
[init -> runtime -> boot_fb] Error: could not open ROM session for "platform_info"
[init -> runtime -> boot_fb] Error: Uncaught exception of type 'Genode::Rom_connection::Rom_connection_failed'
[init -> runtime -> boot_fb] Warning: abort called - thread: ep
[init -> runtime] child "usb" announces service "Usb"
``
nfeske commented 7 months ago

@alex-ab sorry about the boot_fb issue and thanks for the fix. I presumably tested the boot-fb variant on Monday after my latest driver-policy tweaks. But when using image/uefi with the sculpt_test.run script, Qemu happened to boot the (older) Sculpt OS version present on the ahci image, not the current disk image. The test worked well. ;-) I merged your fix to staging just now.

alex-ab commented 7 months ago

@nfeske: all fine, that is the testing phase for ;)

I published close to all packages of mine. Unfortunately, the download of packages into the ram_fs, when used as depot source, reliable stucks with large packets, so I can't really test the proper working of the packages by now.

chelmuth commented 7 months ago

Unfortunately, the download of packages into the ram_fs, when used as depot source, reliable stucks with large packets, so I can't really test the proper working of the packages by now.

@alex-ab Is this limited to the ram_fs or does it also get stuck with rump?

alex-ab commented 7 months ago

@alex-ab Is this limited to the ram_fs or does it also get stuck with rump?

With rump_fs the download finishes. After several "request resources" of ram_fs, it stucks finally.

cproc commented 7 months ago

I got the following error on the current staging branch:

[init -> runtime -> boot_fb] using boot framebuffer: 800x600x32 @ 0x4000000000 type=1 pitch=3200
Error: I/O memory [0000004000000000,0000004010000000) not available
Error: Local MMIO mapping failed!
[init -> drivers -> platform_drv] Error: IO_MEM-session creation failed (label="", ram_quota=6K, cap_quota=3, base=0x4000000000, size=0x10000000, wc=yes)
[init -> drivers -> platform_drv] Error: Uncaught exception of type 'Genode::Service_denied'
[init -> drivers -> platform_drv] Warning: abort called - thread: ep
chelmuth commented 7 months ago

The question is: Why does platform_drv also request the framebuffer IO memory from core? boot_fb does not use the platform_drv so it must be another component.

jschlatow commented 7 months ago

@cproc On what hardware have you seen this? Could you please post or email me the /report/drivers/devices (running the last working version you have at hand)?

cproc commented 7 months ago

The hardware is an Intel NUC 12 (Alder Lake-P). According to /report/drivers/devices the IO memory range belongs to the Intel GPU (@jschlatow: do you still need more information from the report?).

jschlatow commented 7 months ago

Thanks. That's what I suspected.

What's puzzling me is that intel_gpu_drv and boot_fb are both started alongside. The sculpt manager actually considers them mutually exclusive.

nfeske commented 7 months ago

the download of packages into the ram_fs, when used as depot source, reliable stucks with large packets

I guess that commit https://github.com/genodelabs/genode/commit/c35c71a6615578efe6ce19bdf511908fa9192441 is the troublemaker here. I should definitely add a warning once the limit is reached and relax the limit for ram_fs. @alex-ab do you have an intuition what a sensible limit could be?

cproc commented 7 months ago

What's puzzling me is that intel_gpu_drv and boot_fb are both started alongside. The sculpt manager actually considers them mutually exclusive.

I instrumented Sculpt::Fb_driver::update() now and saw 3 calls in the log. On the first call use_intel was false and on the later two calls use_intel was true.

nfeske commented 7 months ago

@alex-ab may you give e424fc59d4947ed4b34527330d72acde759df44f a try?

alex-ab commented 7 months ago

@alex-ab may you give e424fc5 a try?

Works, splendid :+1:

alex-ab commented 7 months ago

What's puzzling me is that intel_gpu_drv and boot_fb are both started alongside. The sculpt manager actually considers them mutually exclusive.

Even so it seems a configuration issue, would it be sensible to catch such attempts in the platform driver, so that it survives. It may not fulfill the service for the specific driver, but stays alive for all other in the system ?

jschlatow commented 7 months ago

Even so it seems a configuration issue, would it be sensible to catch such attempts in the platform driver, so that it survives. It may not fulfill the service for the specific driver, but stays alive for all other in the system ?

That's a reasonable suggestion. I'll have a look.

Regarding the sculpt-manager issue: It seems there is some sort of a race. The sculpt manager seems to get to work before the devices ROM is complete, i.e. on the first call of Sculpt::Fb_driver::update() the Intel display controller is not present and therefore boot_fb is started. It seems though that the Sculpt manager does not (yet) implement a handover from boot_fb to intel_fb since the start node for boot_fb is never removed once it has been constructed. @nfeske What is the rationale for not using _boot_fb.conditional(use_boot_fb, ...) in _sculptmanager/driver/fb.h?

nfeske commented 7 months ago

What is the rationale for not using _boot_fb.conditional(use_boot_fb, ...) in sculpt_manager/driver/fb.h?

Its just for the mundane reason that the Boot_info::with_mode function performs XML parsing, which I wanted to avoid on every single call of update (executed each time when any device-related change occurs). Maybe this is overzealous. I've no strong opinion to keep it that way.

However, one thing makes me wonder: The original driver manager also did not perform any handover from one driver to another. So something else must have changed.

Could we consider the platform driver to defer the reporting the boot_fb device until the point where PCI info is also reported? This would avoid the need to swapping out drivers, which is a new behavior that is possibly best to avoid for the time being.

jschlatow commented 7 months ago

Could we consider the platform driver to defer the reporting the boot_fb device until the point where PCI info is also reported? This would avoid the need to swapping out drivers, which is a new behavior that is possibly best to avoid for the time being.

If I'm not mistaken, the boot_fb is not part of the devices ROM but is announced inthe platform_info ROM. @cproc Could you have a look what the content of the devices ROM is when Sculpt::Board_info::Detected::from_xml() is called the first time? I'm wondering whether the devices ROM is empty (which would be a good indicator to skip further evaluation and wait for the next update).

cproc commented 7 months ago

@cproc Could you have a look what the content of the devices ROM is when Sculpt::Board_info::Detected::from_xml() is called the first time? I'm wondering whether the devices ROM is empty (which would be a good indicator to skip further evaluation and wait for the next update).

Yes, the devices ROM is empty on the first call. The driver manager had a check for that:

https://github.com/genodelabs/genode/blob/9d19410f12e3e4b851e46d56c30749bf84598c6e/repos/gems/src/app/driver_manager/main.cc#L457-L463

jschlatow commented 7 months ago

Even so it seems a configuration issue, would it be sensible to catch such attempts in the platform driver, so that it survives. It may not fulfill the service for the specific driver, but stays alive for all other in the system ?

That's a reasonable suggestion. I'll have a look.

@cproc Can you give 6bba9a4 a try and see whether the platform driver survives the failure to create an IO_MEM session?

nfeske commented 7 months ago

Thanks @cproc for investigating. It is probably best to add the check to the sculpt manager then (along with a comment on why it is needed).

alex-ab commented 7 months ago

It was, up to now, not supported to use boot_fb with intel_gpu - either you use intel gpu + display driver or boot_fb. Of course, this must not stay that way, is just untested/never used.

cproc commented 7 months ago

@cproc Can you give 6bba9a4 a try and see whether the platform driver survives the failure to create an IO_MEM session?

Yes, it survives.

nfeske commented 7 months ago

@cnuke just pointed out to me that the prepare step upon the selection of a used partition is not always executed. Just noting that I am able to reproduce it. I'm on it.

nfeske commented 7 months ago

used partition is not always executed [...]

Fixed by https://github.com/genodelabs/genode/commit/75b5b8a37ee1a2cb4088022cb0c87baa73c15d5f

cproc commented 7 months ago

Thanks @cproc for investigating. It is probably best to add the check to the sculpt manager then (along with a comment on why it is needed).

I'm thinking about adding the check in Sculpt::Drivers::Instance::_handle_devices(Xml_node const &devices). Do ARM platforms get a valid devices ROM too?

nfeske commented 7 months ago

Do ARM platforms get a valid devices ROM too?

They should.

However, since this test is for the specific reason of reliably detecting boot_fb vs intel, wouldn't it be sensible to test the condition in fb.h? For the other devices (storage, usb) such ambiguities do not exist.

cnuke commented 7 months ago

used partition is not always executed [...]

Fixed by 75b5b8a

Thanks for the fix, works nicely :+1:.

cproc commented 7 months ago

However, since this test is for the specific reason of reliably detecting boot_fb vs intel, wouldn't it be sensible to test the condition in fb.h? For the other devices (storage, usb) such ambiguities do not exist.

fb.h currently only knows Board_info and the platform ROM. Without changing the interface, the check could be added to Sculpt::Board_info::Detected::from_xml() and skip the boot_fb detection if the devices ROM is empty or it could be added to Sculpt::Drivers::Instance::_handle_devices(Xml_node const &devices) and skip the _fb_driver update if the devices ROM is empty.

nfeske commented 7 months ago

@cproc sorry for stirring up confusion. Your very last suggestion sounds best to me.

cproc commented 7 months ago

Commit 8410bc0 adds the devices ROM check.

nfeske commented 7 months ago

@cproc Perfect! Merged to staging.

mewmew commented 7 months ago

Perhaps this is working as intended. But I noticed a difference in how the config file system is handled between Sculpt 23.10 and 24.04.

In particular, using 23.10, it was possible to create a persistent config directory on e.g. the used USB drive, such that those configuration files would be loaded when the Genode partition of the USB drive was used.

Personally, I've used this to add new launchers to Sculpt without having to edit e.g. the default-pc.sculpt file. Those launchers would be placed in /usb-1.3.3/config/23.10/launcher/xxx.

I tried to use the same setup for Sculpt 24.04 (using /usb-1.3.3/config/24.04/launcher/xxx at revision b9bd93847baa1d25416138c8dd93b09440583a18), but it doesn't seem to pick up the changes. I expected the new launchers to be made available through the new + -> Options menu, but they don't seem to be there. If I manually copy the files from /usb-1.3.3/config/24.04/launcher/xxx to /config/launcher/xxx then the new launchers appear under + -> Options and work correctly.

Is this the intended behaviour? Can I still use persistent config files (and launchers) on Sculpt 24.04? Perhaps in some other way?

Cheers, Robin

Edit: note that /config/VERSION contains 24.04.

nfeske commented 7 months ago

But I noticed a difference in how the config file system is handled between Sculpt 23.10 and 24.04.

This is not intended and is most likely related to the symptom mentioned at comment https://github.com/genodelabs/genode/issues/5174#issuecomment-2049826341. The original behavior should hopefully be restored by commit https://github.com/genodelabs/genode/commit/75b5b8a37ee1a2cb4088022cb0c87baa73c15d5f (on staging).

nfeske commented 7 months ago

@mewmew it now occurred to me that you were already on the latest staging yesterday. Let me investigate...

nfeske commented 7 months ago

@mewmew unfortunately I'm not able to reproduce the persistent-config problem. I tested the mechanism with a partition on an NVMe device and also a USB stick. My custom launcher and custom nic_router config are imported from the respective config/24.04/ directory just fine. Do you see the "prepare" component in the graph after selecting your Genode partition to "Use"? (you may also look at /report/log to see if the prepare step was executed)

mewmew commented 7 months ago

@mewmew unfortunately I'm not able to reproduce the persistent-config problem. I tested the mechanism with a partition on an NVMe device and also a USB stick. My custom launcher and custom nic_router config are imported from the respective config/24.04/ directory just fine. Do you see the "prepare" component in the graph after selecting your Genode partition to "Use"? (you may also look at /report/log to see if the prepare step was executed)

It seems to work now! A few errors appear in the log related to the prepare component, but they can probably be ignored as it does succeed to populate the /config/launcher files.

Sorry for the confusion, I am guessing the behaviour I observed earlier was prior to your 75b5b8a37ee1a2cb4088022cb0c87baa73c15d5f commit. I must have mixed up the revisions when running the test.

Thanks for the quick solution!

(Just for reference, the errors in the log related to the prepare component are as follows:)

runtime -> prepare -> bash   Error: no plugin found for fcntl(3)
...
runtime -> prepare -> bash   Error: no plugin found for fcntl(19)
nfeske commented 7 months ago

@mewmew the bash-related error messages in the log are normal. They admittedly spoil the aesthetics of the log but are not critical. As I'm planning to simplify the prepare step (using fs_tool instead of bash) down the road anyway, those ugly lines will vanish sometime later.

cnuke commented 7 months ago

Commit 5ef8105 and ceb1286 allow for driving a UHD display.

However, to interact with the leitzentrale - the runtime_view in particular - commit 0e6c126 is necessary. Although the runtime_view gets upgrade eventually it takes multiple attempts and between the various upgrade attempts the whole leitzentrale is sometimes frozen and gets restarted. The quota of 52M is the value that is required to use font-size large (w/o any upgrade).

nfeske commented 7 months ago

Thank you @cnuke for the high-resolution adjustments. I agree with the changes and find the increase of the runtime_view quota sensible. It hosts all dialogs after all.

To counter the total increase of the leitzentrale quotas a bit, we could consider reducing the quota assigned to the drivers subsystem as a subsequent step. With the move of most driver components to the runtime, it got a much smaller footprint.

nfeske commented 6 months ago

@cproc while testing your latest build of falkon-jemalloc, I noticed that the log gets spammed with messages like this:

falkon-jemalloc -> falkon] Error: fcntl(): command 0xc not supported - vfs

Can you reproduces this? Could those messages be silenced?

nfeske commented 6 months ago

@cnuke, the mixer launcher should be improved a little:

nfeske commented 6 months ago

I've update the falkon preset for the use of the mixer in https://github.com/genodelabs/genode/commit/c422c8cfddba225eb47c31f23269bf5531ae9726. The audio driver is not started by default. So one has to enable the 'audio' option.

@cproc @cnuke the falkon pkg by cproc lacks the oss attributes that improve the audio stability. When using the falkon browser for watching a youtube video, there are currently quite irritating garbling sounds. @cproc could you give the preset a spin and possible update the pkg with the adjusted oss settings?

nfeske commented 6 months ago

@chelmuth reported the following keyboard focus issue: Unless an inspect view is open, on cannot edit text via the text editor in the files tab. I'm on this one.

nfeske commented 6 months ago

Commit https://github.com/genodelabs/genode/commit/9d2e389358472de1084eebab707dbba30773981d should fix the keyboard-focus issue.

cproc commented 6 months ago

@cproc while testing your latest build of falkon-jemalloc, I noticed that the log gets spammed with messages like this:

falkon-jemalloc -> falkon] Error: fcntl(): command 0xc not supported - vfs

Can you reproduces this? Could those messages be silenced?

Command 0xc ist F_SETLK. There are multiple places in the Qt source code where this command appears. Should we add the command to the libc VFS plugin as dummy which always succeeds to silence the messages?

nfeske commented 6 months ago

@m-stein commit https://github.com/genodelabs/genode/commit/f0709f3f53c38d1b82f9d6255c4dcb409aebb89f fixes the keyboard-layout selection issue you reported offline.

nfeske commented 6 months ago

Should we add the command to the libc VFS plugin as dummy which always succeeds to silence the messages?

Or could we print the error just once and suppress subsequent messages referring to the same argument as the one last printed? So the log would reveal the values not handled but large sequences of the same diagnostic messages get somewhat compressed into one.