file vault: new version for genode 23.05

m-stein commented 1 year ago

The new version of the file vault is planned to contain the following improvements:

[x] ARM support (by transitioning all Ada/SPARK code to C++)
- [x] block device access
- [x] rekeying
- [x] resizing
- [ ] snapshot management
- [x] integrity check
[x] config/report UI as alternative to GUI
[x] automated testing
[ ] progress indication for rekeying and resizing
[x] generic module framework for internal structure

m-stein commented 1 year ago

There is a first version ready for review (based on current staging tested with the 23.05 toolchain): https://github.com/m-stein/genode/commit/b8a56f9312eb813a5c7b00be1353c464175cfb59

m-stein commented 1 year ago

I've moved all headers to a local path prefixed with "tresor", thereby preventing Tresor from exposing public headers, then applied a major clean-up to types.h and re-based to current staging:

d1a0ea2bf5 fixup "file_vault: version 23.05" (fix "arithmetic between different enumeration types") 091eb557f2 fixup "file_vault: version 23.05" (remove Number_of_leafs) e43ac9e568 fixup "file_vault: version 23.05" (rename Type_1/2_node_index & Node_index, no auto in ft_resizing.cc) 261c169bdd fixup "file_vault: version 23.05" (remove Type_1 node_blocks_index) c9de03160a fixup "file_vault: version 23.05" (rename Type_1/2_node_block_index) b4b6d173cf fixup "file_vault: version 23.05" (rename Snapshots_index) 3833e44488 fixup "file_vault: version 23.05" (rename Superblocks_index) 837513f257 fixup "file_vault: version 23.05" (style fixes in types.h) 4b2c57047e fixup "file_vault: version 23.05" (further clean-up (2) of types.h) ca06453bfb fixup "file_vault: version 23.05" (further clean-up (1) of types.h) 147f340b77 fixup "file_vault: version 23.05" (remove Type_1_node_unpadded) 6fc7d9132b fixup "file_vault: version 23.05" (rename Hash_new) c39d645eba fixup "file_vault: version 23.05" (remove some unused types) 573ee22e27 fixup "file_vault: version 23.05" (fix some redundancies in types.h) a10eda8520 fixup "file_vault: version 23.05" (re-arrange types.h) dd3ffe8f0f fixup "file_vault: version 23.05" (no using namespace in a namespace) fdeb554f28 fixup "file_vault: version 23.05" (remove old recipes) 050b143dee fixup "file_vault: version 23.05" (strict warnings) edab319bd7 fixup "file_vault: version 23.05" (first chelmuth-review changes) 99dddda318 fixup "file_vault: version 23.05" (local include dir, remove "dump" & README remnants) 285df1a85e file_vault: version 23.05

m-stein commented 1 year ago

This is a version with all fix-ups merged in: 3c97831fb8 file_vault: version 23.05

chelmuth commented 1 year ago

@m-stein could you please clean your commit from unrelated HASH changes? Thanks.

chelmuth commented 1 year ago

@m-stein seems I pulled to many commits from your branch yesterday. Please have a look at staging and tell me how to proceed.

m-stein commented 1 year ago

@chelmuth I'm very sorry for the unrelated hashes - a re-basing fault during the preparations for staging. From the current staging state, I'd merely remove the top-most commit a12d3f03e2. This is an unfinished but compilable and tested version of importing namespace Genode into namespace Tresor.

m-stein commented 1 year ago

@chelmuth This https://github.com/m-stein/genode/commit/2c13f7198597d77150eff09bac20f3f21d2559aa is a fix-up suggested by @nfeske.

m-stein commented 1 year ago

@chelmuth As you suggested: https://github.com/m-stein/genode/commit/f46edfc22a2c0d52f95f5df8102a234d2da3bf99.

m-stein commented 1 year ago

@chelmuth I've added a fix-up (c7bd47f02ba15c689cbf57a308cdf0b894f5c89b) for the problems with 32 bit.

chelmuth commented 1 year ago

Thanks, merged to staging.

m-stein commented 1 year ago

@chelmuth These commits (feb9fa227dfaab94f4f017ce70453a62438f2acd, e9ed0f3886ec31dd30a61bcdf18da51e49a67749) fix a minor inconvenience and a problem that caused run/tresor_tester to fail on targets that use the ram-fs vfs plugin.

chelmuth commented 1 year ago

@m-stein repos/gems/recipes/pkg/sculpt_distribution-pc/archives and repos/gems/run/sculpt/index reference pkg/file_vault. What is the current package that should be used in Sculpt?

m-stein commented 1 year ago

@chelmuth It should be file_vault_menu_view (90b028a7c1ee17ab7d98cc0689dafefc563ecf74). I tried to build Sculpt with current staging in order to test the package once again. However, I ran into the following problem, that I'm still investigating (same with sculpt_distribution-pc):

genode/build/x86_64$ VERBOSE= make run/sculpt_test BOARD=pc KERNEL=nova
...
  Program noux-pkg/vim-minimal/vim-minimal
...
checking --with-tlib argument... ncurses
checking for linking with ncurses library... configure: error: FAILED
make[7]: *** [/media/lypo/138e8670-1dd2-11b2-88a4-3bab3642e0c11/depot/local/api/noux/2023-02-26/mk/gnu_build.mk:160: Makefile] Error 1
make[6]: *** [var/libdeps:139: vim-minimal.prg] Error 2
make[5]: *** [Makefile:336: gen_deps_and_build_targets] Error 2
make[4]: *** [/home/lypo/genodelabs/genode/tool/depot/mk/build_bin_archive:208: /media/lypo/138e8670-1dd2-11b2-88a4-3bab3642e0c11/depot/local/bin/x86_64/vim-minimal/2023-05-25.build/bin] Error 1
make[3]: *** [/media/lypo/138e8670-1dd2-11b2-88a4-3bab3642e0c11/depot/var/build.mk:308: local/bin/x86_64/vim-minimal/2023-05-25] Error 2
make[2]: *** [/home/lypo/genodelabs/genode/tool/depot/build:136: execute_generated_build_mk_file] Error 2
make[1]: *** [/home/lypo/genodelabs/genode/tool/depot/create:59: build] Error 2
make[1]: Leaving directory '/home/lypo/genodelabs/genode/build/x86_64'

chelmuth commented 1 year ago

@chelmuth It should be file_vault_menu_view (90b028a).

On second thought (and because you yourself forgot to adapt sculpt/index), I feel certain pkg/file_vault should not be renamed and keep its known name. Please revert these changes. While doing this I'd be glad if you'd rename file_vault_cfg_report to file_vault_config_report because we shun unnecessary abbreviations (remember blk vs. block and see #4420 regarding drv).

I tried to build Sculpt with current staging in order to test the package once again. However, I ran into the following problem, that I'm still investigating (same with sculpt_distribution-pc):

genode/build/x86_64$ VERBOSE= make run/sculpt_test BOARD=pc KERNEL=nova
...
  Program noux-pkg/vim-minimal/vim-minimal
...
checking --with-tlib argument... ncurses
checking for linking with ncurses library... configure: error: FAILED
make[7]: *** [/media/lypo/138e8670-1dd2-11b2-88a4-3bab3642e0c11/depot/local/api/noux/2023-02-26/mk/gnu_build.mk:160: Makefile] Error 1
make[6]: *** [var/libdeps:139: vim-minimal.prg] Error 2
make[5]: *** [Makefile:336: gen_deps_and_build_targets] Error 2
make[4]: *** [/home/lypo/genodelabs/genode/tool/depot/mk/build_bin_archive:208: /media/lypo/138e8670-1dd2-11b2-88a4-3bab3642e0c11/depot/local/bin/x86_64/vim-minimal/2023-05-25.build/bin] Error 1
make[3]: *** [/media/lypo/138e8670-1dd2-11b2-88a4-3bab3642e0c11/depot/var/build.mk:308: local/bin/x86_64/vim-minimal/2023-05-25] Error 2
make[2]: *** [/home/lypo/genodelabs/genode/tool/depot/build:136: execute_generated_build_mk_file] Error 2
make[1]: *** [/home/lypo/genodelabs/genode/tool/depot/create:59: build] Error 2
make[1]: Leaving directory '/home/lypo/genodelabs/genode/build/x86_64'

This issue is fixed by rebuilding the tool chain with current staging (see @cproc's mail from Tue, 23 May 2023) .

chelmuth commented 1 year ago

@m-stein depot_autopilot test-file_vault_config_report is broken on nova x86_32 and x86_64.

[2023-05-25 03:11:36] [init -> depot_autopilot] --- Run "test-file_vault_cfg_report" (max 240 sec) ---
[2023-05-25 03:11:36] [init -> depot_autopilot] 
[2023-05-25 03:11:36] [init -> depot_autopilot] 0.165 [init -> file_vault] Error: failed to deliver Report session with label fs_query -> listing
[2023-05-25 03:11:36] [init -> depot_autopilot] 0.178 [init -> report_rom] report 'file_vault -> ui_report'
[2023-05-25 03:11:36] [init -> depot_autopilot] 0.187 [init -> report_rom]   <ui_report version="step_1_wait" state="uninitialized"/>
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.149 [init -> report_rom] report 'file_vault -> ui_report'
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.158 [init -> report_rom]   <ui_report version="step_2_init" state="initializing"/>
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.240 [init -> file_vault] child "truncate_file" exited with exit value 0
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.310 [init -> file_vault] child "tresor_init_trust_anchor" exited with exit value 0
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.405 [init -> file_vault] child "tresor_init" exited with exit value 0
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.520 [init -> file_vault] child "sync_to_tresor_vfs_init" exited with exit value 0
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.609 [init -> file_vault -> mke2fs] mke2fs 1.46.5 (30-Dec-2021)
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.640 [init -> file_vault -> mke2fs] Error: no plugin found for socket()
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.653 [init -> file_vault -> mke2fs] Error: no plugin found for socket()
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.663 [init -> file_vault -> mke2fs] Creating filesystem with 1024 1k blocks and 128 inodes
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.675 [init -> file_vault -> mke2fs] 
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.682 [init -> file_vault -> mke2fs] Allocating group tables: 0/1   done
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.713 [init -> file_vault -> mke2fs] Writing inode tables: 0/1   done
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.776 [init -> file_vault -> mke2fs] Writing superblocks and filesystem accounting information: 0/1   done
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.790 [init -> file_vault -> mke2fs] 
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.802 [init -> file_vault] child "mke2fs" exited with exit value 0
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.846 [init -> file_vault -> client_fs_fs_query] Error: failed to watch '/'
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.857 [init -> file_vault -> client_fs_fs_query] Error: failed to watch '/data'
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.869 [init -> report_rom] report 'file_vault -> ui_report'
[2023-05-25 03:11:41] [init -> depot_autopilot] 5.878 [init -> report_rom]   <ui_report version="step_2_init" state="unlocked"/>
[2023-05-25 03:11:46] [init -> depot_autopilot] 10.149 [init -> report_rom] report 'file_vault -> ui_report'
[2023-05-25 03:11:46] [init -> depot_autopilot] 10.158 [init -> report_rom]   <ui_report version="step_3_lock" state="locking"/>
[2023-05-25 03:11:46] [init -> depot_autopilot] 10.210 [init -> file_vault] child "lock_fs_tool" exited with exit value 0
[2023-05-25 03:11:46] [init -> depot_autopilot] 10.247 [init -> file_vault -> lock_fs_query] Error: failed to watch '/tresor/control'
[2023-05-25 03:15:36] [init -> depot_autopilot] 
[2023-05-25 03:15:36] [init -> depot_autopilot]  test-file_vault_cfg_report      failed   239.983  timeout 240 sec

m-stein commented 1 year ago

@chelmuth With my merge_to_staging re-based onto current staging I cannot reproduce this issue with neither of the platforms. Shall we wait another night? However, I realized your suggestions: c08c0bf0dc fixup "file_vault: version 23.05" (file_vault_cfg_report -> file_vault_config_report) 60efb1e38f fixup "file_vault: version 23.05" (file_vault_menu_view -> file_vault)

chelmuth commented 1 year ago

That's strange as it seems to have failed several times...

> find autopilot.checker.2023-05-2*/ -name \*depot_autopilot.log | xargs grep -la file_vault_cfg_report.*failed
autopilot.checker.2023-05-23-00-11/qemu/x86_64.pc.sel4.depot_autopilot.log
autopilot.checker.2023-05-23-00-11/qemu/x86_64.pc.foc.depot_autopilot.log
autopilot.checker.2023-05-23-00-11/qemu/arm_v8a.rpi3.foc.depot_autopilot.log
autopilot.checker.2023-05-23-00-11/x86_64/x86_64.pc.nova.depot_autopilot.log
autopilot.checker.2023-05-24-00-15/qemu/x86_64.pc.sel4.depot_autopilot.log
autopilot.checker.2023-05-24-00-15/qemu/x86_64.pc.foc.depot_autopilot.log
autopilot.checker.2023-05-24-00-15/qemu/arm_v8a.rpi3.foc.depot_autopilot.log
autopilot.checker.2023-05-24-00-15/x86_64/x86_64.pc.nova.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/x86_32/x86_32.pc.nova.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/arm_v7a.pbxa9.hw.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_32.pc.fiasco.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_64.pc.sel4.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_32.pc.sel4.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_64.pc.foc.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/arm_v8a.rpi3.foc.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/arm_v7a.zynq_qemu.hw.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_32.pc.foc.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/arm_v7a.pbxa9.foc.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_32.pc.pistachio.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/qemu/x86_32.pc.okl4.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/imx7d_sabre/arm_v7a.imx7d_sabre.sel4.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/imx7d_sabre/arm_v7a.imx7d_sabre.foc.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/imx7d_sabre/arm_v7a.imx7d_sabre.hw.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/imx6q_sabrelite/arm_v7a.imx6q_sabrelite.sel4.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/x86_64/x86_64.pc.nova.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/imx53_qsb/arm_v7a.imx53_qsb.hw.depot_autopilot.log
autopilot.checker.2023-05-25-00-11/imx53_qsb/arm_v7a.imx53_qsb_tz.hw.depot_autopilot.log

m-stein commented 1 year ago

Ok, I'll investigate further.

m-stein commented 1 year ago

@chelmuth With the help of @cnuke, I created two fix-ups (332bbb5cff6c2969828893afb615d80d6c592854, 0a4f1a47d753160beb568766e2a96daddb6da317) where the latter should fix the failing tests on hardware.

chelmuth commented 1 year ago

Merged 798e3ee306cc58b4b6efe265f3299ad9acc99cf3 as standalone commit after mastering current staging.

jschlatow commented 1 year ago

@m-stein I cherry-picked aeb65d6 and 798e3ee on my sculpt_23.04 branch to give it a spin. Unfortunately, I'm getting errors when using it with an already populated fs and when using it with a clean ram fs.

Running the new file vault on a fresh ram fs triggers the following messages:

[runtime] child "file_vault"
[runtime]   RAM quota:  225032K
[runtime]   cap quota:  2166
[runtime]   ELF binary: init
[runtime]   priority:   3
[runtime]   provides service File_system
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label fs_query -> listing
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label menu_view -> hover
[runtime -> file_vault -> file_vault] child "menu_view" requests resources: ram_quota=151628
[runtime -> file_vault -> file_vault] child "truncate_file" exited with exit value 0
[runtime -> file_vault -> file_vault] child "tresor_init_trust_anchor" exited with exit value 0
[runtime -> file_vault] Warning: file_vault: no route to service "File_system" (label="file_vault -> tresor_init -> ")
[runtime -> file_vault -> file_vault -> tresor_init] Error: File_system-session creation failed (label="", ram_quota=1088K, cap_quota=12, root="", writeable=1, tx_buf_size=1048576)
[runtime -> file_vault -> file_vault -> tresor_init] Error: failed to create <fs> VFS nodeor_or_or_
[runtime -> file_vault -> file_vault -> tresor_init] Error:     buffer_size="1M"
[runtime -> file_vault -> file_vault -> tresor_init] Error: failed to open file /tresor.img
[runtime -> file_vault -> file_vault -> tresor_init] Error: Uncaught exception of type 'vfs_open(Vfs::Env&, Genode::String<128ul>, Vfs::Directory_service::Open_mode)::Failed'
[runtime -> file_vault -> file_vault -> tresor_init] Warning: abort called - thread: ep

When running it on my old file_vault's fs, I'm getting the following errors:

[runtime] child "file_vault"
[runtime]   RAM quota:  225032K
[runtime]   cap quota:  2166
[runtime]   ELF binary: init
[runtime]   priority:   3
[runtime]   provides service File_system
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label fs_query -> listing
[runtime -> file_vault -> file_vault] Error: Uncaught exception of type 'File_vault::Main::_state_from_string(Genode::String<64ul> const&)::Invalid_state_string'
[runtime -> file_vault -> file_vault] Warning: abort called - thread: ep

Am I missing something?

m-stein commented 1 year ago

@jschlatow Thanks for the report! I cannot see a problem with your setup so far. I'm investigating it.

m-stein commented 1 year ago

@jschlatow I've created a fix-up d0e3a9e76014613192000afc760cbc1a14a19855 that should solve the routing problem. With this, File Vault is running fine on my Sculpt on a fresh encrypted container.

As to the legacy containers: I already noticed that some days ago. There are multiple problems with legacy containers of which I already solved a bunch. However, I was not able to finish work at it yet because of more important issues. I'm at it and inform you as soon as I have fixed all related problems.

cnuke commented 1 year ago

While looking into the CI results I noticed that the removed cbe_tester was still referenced. Commit 1a8a5b8 replaces it with the tresor_tester.

m-stein commented 1 year ago

@cnuke Thanks for fixing that!

jschlatow commented 1 year ago

I've created a fix-up d0e3a9e that should solve the routing problem. With this, File Vault is running fine on my Sculpt on a fresh encrypted container.

@m-stein Thanks for the fix. Works like a charm. Good news is that I can now start the thunderbird@seoul VM from @alex-ab using the new file vault. However, for some reason, changes to the appliance's vdi image seem to be discarded. I.e. when I restart the VM, all changes I previously made are lost. I first suspected the vfs_import overwriting the vdi, but I found the following lines in the log which contradict this:

[runtime -> thunderbird@seoul -> vdi_block] --- Starting VDI driver ---
[runtime -> thunderbird@seoul -> vdi_block] Warning: skipping copy of file /block_ext4.vdi, OPEN_ERR_EXISTS
[runtime -> thunderbird@seoul -> vdi_block] Provide '/block_ext4.vdi' as block device, writeable: yes

Strangely, the VM operates fine when using ram_fs instead of the file_vault. Do you have any idea?

m-stein commented 1 year ago

@jschlatow I've added a commit (https://github.com/m-stein/genode/commit/dda9cff7acf6a2a12bcc79a1d2bdc79e799064ae) that should fix the problems with existing File Vault containers. It affects only details in the File Vault itself and not the underlying Tresor library.

m-stein commented 1 year ago

@jschlatow Am I getting you right that you restart both the VM and the File Vault? In this case, a possible explanation would be that we are missing to secure the superblock. However, if you're locking the File Vault before restarting, the superblock should get secured. I'll try to reproduce your error case.

jschlatow commented 1 year ago

@jschlatow Am I getting you right that you restart both the VM and the File Vault? In this case, a possible explanation would be that we are missing to secure the superblock. However, if you're locking the File Vault before restarting, the superblock should get secured. I'll try to reproduce your error case.

I'm only restarting the VM.

jschlatow commented 1 year ago

@jschlatow I've added a commit (m-stein@dda9cff) that should fix the problems with existing File Vault containers. It affects only details in the File Vault itself and not the underlying Tresor library.

Thanks for the fix. I now see the file-vault window and can open the container. However, I left the container in an unclean state with the old file vault, which also triggers an error with the new file vault:

[runtime] child "file_vault"
[runtime]   RAM quota:  225032K
[runtime]   cap quota:  2166
[runtime]   ELF binary: init
[runtime]   priority:   3
[runtime]   provides service File_system
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label fs_query -> listing
[runtime -> file_vault -> file_vault] assuming version 21.05
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label menu_view -> hover
[runtime -> file_vault -> file_vault] child "tresor_init_trust_anchor" exited with exit value 0
[runtime -> file_vault -> file_vault] child "sync_to_tresor_vfs_init" exited with exit value 0
[runtime -> file_vault -> file_vault -> client_fs_fs_query] Error: failed to watch '/'
[runtime -> file_vault -> file_vault -> client_fs_fs_query] Error: failed to watch '/data'
[runtime -> file_vault -> file_vault -> rump_vfs] rump: /genode: file system not clean; please fsck(8)
[runtime -> file_vault -> file_vault -> tresor_vfs] Error: Uncaught exception of type 'Tresor::Meta_tree::execute(bool&)::Exception_1'
[runtime -> file_vault -> file_vault -> tresor_vfs] Warning: abort called - thread: ep

Do you think there is a way to make it more robust or even recover from such a state?

m-stein commented 1 year ago

@jschlatow It seems that you have the wrong commit because the " assuming version 21.05" output was only for debugging purpose and is not part of my merge_to_staging state.

Regarding the error with the old container: You're having a hash mismatch somewhere in your "Meta" tree. In a way, the behavior you're seeing is for robustness. The Tresor checks the hash of everything it touches before making any modifications and refrains from going on whenever it encounters a bad hash. However, you're perfectly right that the current reaction is uncomfortable. I'll replace all hash-mismatch exceptions with marking the corresponding request failed, so, the block encryption stack doesn't crash.

Furthermore, recovery from bad hashes is a topic that is not on the roadmap yet. I'll discuss this with the others.

m-stein commented 1 year ago

I've fixed the uncaught exceptions on hash mismatches (https://github.com/m-stein/genode/commit/aabb2d51e842312e93f81d24511bd1b572332cd8). I'm still at investigating the loss-of-state issue.

chelmuth commented 1 year ago

depot_autopilot failed last night like follows.

> find autopilot.checker.2023-06-01-00-12/ -name \*.depot_autopilot.log | xargs grep -am1 'file_vault_config_report .*failed' | nl -w2
 1      autopilot.checker.2023-06-01-00-12/qemu/arm_v7a.pbxa9.hw.depot_autopilot.log:[2023-06-01 06:27:15] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.997  timeout 240 sec
 2      autopilot.checker.2023-06-01-00-12/qemu/x86_32.pc.fiasco.depot_autopilot.log:[2023-06-01 02:26:16] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.989  timeout 240 sec
 3      autopilot.checker.2023-06-01-00-12/qemu/x86_64.pc.sel4.depot_autopilot.log:[2023-06-01 03:48:45] [init -> depot_autopilot]  test-file_vault_config_report   failed   240.001  timeout 240 sec
 4      autopilot.checker.2023-06-01-00-12/qemu/x86_64.pc.foc.depot_autopilot.log:[2023-06-01 03:11:45] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.999  timeout 240 sec
 5      autopilot.checker.2023-06-01-00-12/qemu/arm_v8a.rpi3.foc.depot_autopilot.log:[2023-06-01 06:11:49] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.998  timeout 240 sec
 6      autopilot.checker.2023-06-01-00-12/qemu/arm_v7a.zynq_qemu.hw.depot_autopilot.log:[2023-06-01 06:39:45] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.998  timeout 240 sec
 7      autopilot.checker.2023-06-01-00-12/qemu/x86_32.pc.foc.depot_autopilot.log:[2023-06-01 02:41:39] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.997  timeout 240 sec
 8      autopilot.checker.2023-06-01-00-12/qemu/arm_v7a.pbxa9.foc.depot_autopilot.log:[2023-06-01 06:09:43] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.996  timeout 240 sec
 9      autopilot.checker.2023-06-01-00-12/qemu/x86_32.pc.pistachio.depot_autopilot.log:[2023-06-01 03:22:59] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.993  timeout 240 sec
10      autopilot.checker.2023-06-01-00-12/qemu/x86_32.pc.okl4.depot_autopilot.log:[2023-06-01 03:08:45] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.998  timeout 240 sec
11      autopilot.checker.2023-06-01-00-12/imx7d_sabre/arm_v7a.imx7d_sabre.foc.depot_autopilot.log:[2023-06-01 07:20:23] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.991  timeout 240 sec
12      autopilot.checker.2023-06-01-00-12/imx7d_sabre/arm_v7a.imx7d_sabre.hw.depot_autopilot.log:[2023-06-01 07:35:18] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.983  timeout 240 sec
13      autopilot.checker.2023-06-01-00-12/imx6q_sabrelite/arm_v7a.imx6q_sabrelite.sel4.depot_autopilot.log:[2023-06-01 08:17:29] [init -> depot_autopilot]  test-file_vault_config_report   failed   270.000  reboot
14      autopilot.checker.2023-06-01-00-12/imx6q_sabrelite/arm_v7a.imx6q_sabrelite.foc.depot_autopilot.log:[2023-06-01 07:46:13] [init -> depot_autopilot]  test-file_vault_config_report   failed   270.000  reboot
15      autopilot.checker.2023-06-01-00-12/imx6q_sabrelite/arm_v7a.imx6q_sabrelite.hw.depot_autopilot.log:[2023-06-01 07:51:35] [init -> depot_autopilot]  test-file_vault_config_report   failed   239.984  timeout 240 sec
16      autopilot.checker.2023-06-01-00-12/imx53_qsb/arm_v7a.imx53_qsb.hw.depot_autopilot.log:[2023-06-01 06:46:55] [init -> depot_autopilot]  test-file_vault_config_report   failed   268.361  timeout 240 sec
17      autopilot.checker.2023-06-01-00-12/imx53_qsb/arm_v7a.imx53_qsb_tz.hw.depot_autopilot.log:[2023-06-01 06:55:41] [init -> depot_autopilot]  test-file_vault_config_report   failed   280.243  timeout 240 sec

chelmuth commented 1 year ago

run/tresor_tester failed in 17 scenarios last night with diverse errors, for example

rpi: trust_anchor_fs: configured RAM exceeds available RAM, proceed with 13869568
zynq: Uncaught exception of type 'Trust_anchor::_execute_init()::Bad_private_key_io_buffer_size'
pbxa9: Uncaught exception of type 'Trust_anchor::_execute_init()::Bad_private_key_io_buffer_size'
imx7/foc: KERNEL0: alignment error at 010620b5 (PC: 01016bc8, SP: 401ffe00, FSR: 90000261, PSR: 20000110)
plain hang + timeout

chelmuth commented 1 year ago

While testing commit d48cba0a2d45c53814ce7aa725aab7738bb19dcc I got the feeling tresor_tester requires to much time. Could we change the test to reduce the runtime?

jschlatow commented 1 year ago

I've fixed the uncaught exceptions on hash mismatches (m-stein@aabb2d5).

Thanks for the quick fixup. I'm still getting an uncaught exception though:

[runtime] child "file_vault"
[runtime]   RAM quota:  225032K
[runtime]   cap quota:  2166
[runtime]   ELF binary: init
[runtime]   priority:   3
[runtime]   provides service File_system
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label fs_query -> listing
[runtime -> file_vault -> file_vault] Error: failed to deliver Report session with label menu_view -> hover
[runtime -> file_vault -> file_vault] child "tresor_init_trust_anchor" exited with exit value 0
[runtime -> file_vault -> file_vault] child "sync_to_tresor_vfs_init" exited with exit value 0
[runtime -> file_vault -> file_vault -> client_fs_fs_query] Error: failed to watch '/'
[runtime -> file_vault -> file_vault -> client_fs_fs_query] Error: failed to watch '/data'
[runtime -> file_vault -> file_vault -> rump_vfs] rump: /genode: file system not clean; please fsck(8)
[runtime -> file_vault -> file_vault -> tresor_vfs] Error: update request failed, reason: "node hash mismatch"
[runtime -> file_vault -> file_vault -> tresor_vfs] Error: Uncaught exception of type 'Tresor::Free_tree::generated_request_complete(Tresor::Module_request&)::Exception_6'
[runtime -> file_vault -> file_vault -> tresor_vfs] Warning: abort called - thread: ep

m-stein commented 1 year ago

@chelmuth These commits ad059c3025fbb7889087ff76108e2f4716f8db35, ff82480bd0d9f92a038865555fc4e7e7b3a9ee35 should help with the time issue. In the former one, we can skip benchmarks also on more platforms if you like. I thought it would be nice to have at least one x86_64 and one aarch64 target as reference. I've also added this general clean-up 3a91597ec678918eb783a925b2fe13d127f2c964.

m-stein commented 1 year ago

@chelmuth I'll have a look at the failed tests.

chelmuth commented 1 year ago

Merged your branch to staging, looking forward to tomorrows results.

m-stein commented 1 year ago

It seems, that on PBXA9 and Zynq_qemu, the problem with file_vault_config_report is, that the performance counter doesn't work. This causes the jitterentropy init to fail which causes the read on vfs jitterentropy by tresor_trust_anchor to fail. I checked that we set the PMUSEREN (enabling user perf counter) reg to 1 on each cpu. But even in the kernel, the perf counter yields 0.

jschlatow commented 1 year ago

I'm still at investigating the loss-of-state issue.

I've added a vfs using vfs_audit in between the file vault and thunderbird@seoul: It does not show any writes. I believe this suggests that the writes get stuck in the vdi_block of thunderbird@seoul. However, what I'm puzzled over is the fact that if I route my vfs auditing component to one of my ext2 partitions instead of the file vault, I see write accesses from thunderbird@seoul all the time. @alex-ab, @m-stein: Do you have any idea what might cause the difference between these supposedly similar scenarios?

m-stein commented 1 year ago

This (11bf3766d29560bf024725b789eb3b182b4b2f38) should solve the issues with file_vault_config_report on pistachio and improve the tests robustness in general. This (0b69af7f3d76155d4f0f5ed03c7abafe68a86328) was a small fix I discovered while debugging zynq.

alex-ab commented 1 year ago

I'm still at investigating the loss-of-state issue.

I've added a vfs using vfs_audit in between the file vault and thunderbird@seoul: It does not show any writes. I believe this suggests that the writes get stuck in the vdi_block of thunderbird@seoul. However, what I'm puzzled over is the fact that if I route my vfs auditing component to one of my ext2 partitions instead of the file vault, I see write accesses from thunderbird@seoul all the time. @alex-ab, @m-stein: Do you have any idea what might cause the difference between these supposedly similar scenarios?

Curious, i have no immediate idea.

m-stein commented 1 year ago

@jschlatow I'm not sure. My first guess would be that the Tresor VFS plugins behave subtly different than other Block/FS plugins. However, AFAICT, this effect would have to be visible through the Rump plugin in order to explain the scenario you describe. I have the feeling that we should dive into the workings of the VDI block plugin.

m-stein commented 1 year ago

@chelmuth These commits can be merged and should reduce the nightly failed tests: ce714efa0e vfs/tresor_trust_anchor: remove debug mode 8c695ddd0a tresor_tester.run: reduce test timeout c852f96d19 tresor tests: use jitterentropy only if supported a2cf477565 tresor_tester.run: reduce ram consumption e7dfa2c606 tresor_tester.run: no implicit routes/resources a90d08c89b tool: add qemu run opts for zynq_qemu

m-stein commented 1 year ago

@chelmuth How do you like this fix-up?

5666be1007 fixup "tresor tests: use jitterentropy only if supported"

alex-ab commented 1 year ago

I've added a vfs using vfs_audit in between the file vault and thunderbird@seoul: It does not show any writes. I believe this suggests that the writes get stuck in the vdi_block of thunderbird@seoul. However, what I'm puzzled over is the fact that if I route my vfs auditing component to one of my ext2 partitions instead of the file vault, I see write accesses from thunderbird@seoul all the time. @alex-ab, @m-stein: Do you have any idea what might cause the difference between these supposedly similar scenarios?

I could reproduce the issue now. Inside the VM the mounting is denied, because the filesytem is not recognized. So, no changes are written via vdi_block, since it is simply not used at all. Using dd on the /dev/sdb, it turns out solely zeros are returned. Instrumenting the vdi_block for the good case (ram_fs or rump_fs) and the bad case (file_vault), reveals that indeed zeros are read for some of the crucial blocks.

So, I created a simple test package (via my depot 23.04 index available), which just imports the initial empty vdi (with ext4 prepared, around 34M) and does nothing else. I tried this with ram_fs, rump_fs and file_vault. Afterwards i copied it out with a shell, imported into a VM and created and checked the hash. It turns out, that the files read back by ram_fs and rump_fs match the original imported one, the one of file_vault does not.

m-stein commented 1 year ago

@alex-ab Thanks for your investigation and extensive feedback! I'll have a further look into the scenario tomorrow!

chelmuth commented 1 year ago

@chelmuth How do you like this fix-up?

5666be1 fixup "tresor tests: use jitterentropy only if supported"

I must admit I'm torn. On the one hand I liked the concise section about platform-specific configuration of skipped tests. On the other jitterentropy looks like a cross-platform concern. As I still gravitate to the concise section, which is more easy to my eye as a maintainer: May we just stay with the list off skip configurations in this run script (with just a simple skip_test_if tool)?

m-stein commented 1 year ago

@chelmuth I've excluded also imx53_qsb_tz for now: 2368eb693c fixup "tresor_tester.run: circumvent alignment fault"

genodelabs / genode

file vault: new version for genode 23.05 #4819