Closed lmbarros closed 5 months ago
Details about this instance of the error:
2023-12-11-raspios-bookworm-arm64-lite.img.gz
).raspberrypi3-64
.--no-dt-check
and --no-nwmgr-check
Complete logs follow.
stage1.log
:
2024-02-28 13:54:29 INFO Detected OS Architecture is ARM64
2024-02-28 13:54:29 INFO Disabling all device type-related checks due to no-dt-check option
2024-02-28 13:54:29 INFO Detected device type: Dummy,
2024-02-28 13:54:29 INFO Configured for application id: 2034527
2024-02-28 13:54:29 INFO connection to api: api.balena-cloud.com:80 is ok
2024-02-28 13:54:29 INFO connection to vpn: cloudlink.balena-cloud.com:443 is ok
2024-02-28 13:54:29 INFO config.json is for device type raspberrypi3-64
2024-02-28 13:54:29 WARN [takeover::stage1::image_retrieval] OS download is not supported for device type 'raspberrypi3-64', proceeding due to no-dt-check option
2024-02-28 13:54:29 DEBUG [takeover::stage1::api_calls] get_os_versions: request_url: 'https://api.balena-cloud.com/v6/release?$select=raw_version&$filter=(is_final%20eq%20true)%20and%20(is_passing_tests%20eq%20true)%20and%20(is_invalidated%20eq%20false)%20and%20(status%20eq%20%27success%27)%20and%20(belongs_to__application/any(bta:((bta/is_host%20eq%20true)%20and%20(bta/is_for__device_type/any(iodt:iodt/slug%20eq%20%27raspberrypi3-64%27)))%20and%20(not(bta/application_tag/any(at:at/tag_key%20eq%20%27release-policy%27))%20or%20(bta/application_tag/any(at:(at/tag_key%20eq%20%27release-policy%27)%20and%20(at/value%20eq%20%27default%27))))))&$orderby=created_at%20desc'
2024-02-28 13:54:30 DEBUG [reqwest::connect] starting new connection: https://api.balena-cloud.com/
2024-02-28 13:54:30 DEBUG [takeover::stage1::api_calls] Result = Response { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.balena-cloud.com")), port: None, path: "/v6/release", query: Some("$select=raw_version&$filter=(is_final%20eq%20true)%20and%20(is_passing_tests%20eq%20true)%20and%20(is_invalidated%20eq%20false)%20and%20(status%20eq%20%27success%27)%20and%20(belongs_to__application/any(bta:((bta/is_host%20eq%20true)%20and%20(bta/is_for__device_type/any(iodt:iodt/slug%20eq%20%27raspberrypi3-64%27)))%20and%20(not(bta/application_tag/any(at:at/tag_key%20eq%20%27release-policy%27))%20or%20(bta/application_tag/any(at:(at/tag_key%20eq%20%27release-policy%27)%20and%20(at/value%20eq%20%27default%27))))))&$orderby=created_at%20desc"), fragment: None }, status: 200, headers: {"date": "Wed, 28 Feb 2024 13:54:30 GMT", "content-type": "application/json; charset=utf-8", "transfer-encoding": "chunked", "connection": "keep-alive", "cache-control": "no-cache", "etag": "W/\"5b7-3SqWAv4pwRF8/cK1Z98jFzBk1JA\"", "vary": "Accept-Encoding", "cf-cache-status": "DYNAMIC", "strict-transport-security": "max-age=15552000", "server": "cloudflare", "cf-ray": "85c928eabb70bcc3-POA", "alt-svc": "h3=\":443\"; ma=86400"} }
2024-02-28 13:54:30 DEBUG [takeover::stage1::image_retrieval] Looking at version 5.0.8
2024-02-28 13:54:30 INFO Selected default version (5.0.8) for download
2024-02-28 13:54:30 INFO Downloading Balena OS image, selected version is: '5.0.8'
2024-02-28 13:54:30 DEBUG [takeover::stage1::api_calls] get_os_image: request_url: 'https://api.balena-cloud.com/download'
2024-02-28 13:54:30 DEBUG [takeover::stage1::api_calls] get_os_image: data: 'ImageRequestData { device_type: "raspberrypi3-64", version: "5.0.8", file_type: ".gz" }'
2024-02-28 13:54:30 DEBUG [reqwest::connect] starting new connection: https://api.balena-cloud.com/
2024-02-28 13:54:32 DEBUG [takeover::stage1::api_calls] Result = Response { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.balena-cloud.com")), port: None, path: "/download", query: None, fragment: None }, status: 200, headers: {"date": "Wed, 28 Feb 2024 13:54:32 GMT", "content-type": "application/octet-stream", "content-length": "180411412", "connection": "keep-alive", "vary": "X-HTTP-Method-Override", "access-control-allow-origin": "*", "access-control-allow-methods": "GET, PUT, POST, PATCH, DELETE, OPTIONS, HEAD", "access-control-allow-headers": "Content-Type, Authorization, Application-Record-Count, MaxDataServiceVersion, X-Requested-With", "access-control-allow-credentials": "true", "access-control-max-age": "86400", "content-disposition": "attachment; filename=\"raspberrypi3-64-5.0.8-v15.0.4.img.gz\"", "x-transfer-length": "180411412", "cache-control": "no-store, no-transform", "cf-cache-status": "DYNAMIC", "strict-transport-security": "max-age=15552000", "server": "cloudflare", "cf-ray": "85c928ef2d05bccf-POA", "alt-svc": "h3=\":443\"; ma=86400"} }
2024-02-28 13:54:32 DEBUG [takeover::stage1::image_retrieval] Downloading file '/home/lmb/balena-cloud-raspberrypi3-64-5.0.8.img.gz'
2024-02-28 13:54:44 INFO 132 MiB read in 11 seconds @12 MiB/sec
2024-02-28 13:54:47 INFO 172 MiB read in 14 seconds @12 MiB/sec
2024-02-28 13:54:47 INFO The balena OS image was successfully written to '/home/lmb/balena-cloud-raspberrypi3-64-5.0.8.img.gz'
2024-02-28 13:54:47 DEBUG [takeover::stage1::migrate_info] image path: '/home/lmb/balena-cloud-raspberrypi3-64-5.0.8.img.gz'
2024-02-28 13:54:47 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "537\n", stderr: "" }
2024-02-28 13:54:47 WARN [takeover::stage1::migrate_info] No Network manager files were found, the device might not be able to come online
2024-02-28 13:54:47 INFO Writing hostname to config.json: 'raspberrypi'
2024-02-28 13:54:47 DEBUG [takeover::common] parse_file: line: 'PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"'
2024-02-28 13:54:52 INFO Preparing for takeover..
2024-02-28 13:54:52 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "" }
2024-02-28 13:54:52 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "\tlinux-vdso.so.1 (0x0000007fba1f6000)\n\tlibc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007fb9fd0000)\n\t/lib/ld-linux-aarch64.so.1 (0x0000007fba1b9000)\n", stderr: "" }
2024-02-28 13:54:52 DEBUG [takeover::stage1::exe_copy] get_libs: no path for linux-vdso.so.1
2024-02-28 13:54:52 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(256)), stdout: "", stderr: "\tnot a dynamic executable\n" }
2024-02-28 13:54:52 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "\tstatically linked\n", stderr: "" }
2024-02-28 13:54:52 DEBUG [takeover::stage1::exe_copy] get_libs: lib is statically linked: '/lib/ld-linux-aarch64.so.1'
2024-02-28 13:54:52 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "\t/lib/ld-linux-aarch64.so.1 (0x0000007fae965000)\n\tlinux-vdso.so.1 (0x0000007fae9a2000)\n", stderr: "" }
2024-02-28 13:54:52 DEBUG [takeover::stage1::exe_copy] get_libs: no path for linux-vdso.so.1
2024-02-28 13:54:52 DEBUG [takeover::stage1] Space required for commands: 11 MiB
2024-02-28 13:54:52 INFO Found 908 MiB total, 232 MiB free memory
2024-02-28 13:54:52 DEBUG [takeover::common::system] mkdir: '/balena-takeover'
2024-02-28 13:54:52 INFO Using '/balena-takeover' as takeover directory
2024-02-28 13:54:52 INFO Mounted tmpfs file system on '/balena-takeover'
2024-02-28 13:54:52 INFO Created mtab in '/balena-takeover/etc/mtab'
2024-02-28 13:54:52 INFO Mounted proc file system on '/balena-takeover/proc'
2024-02-28 13:54:52 INFO Mounted tmpfs file system on '/balena-takeover/tmp'
2024-02-28 13:54:52 INFO Mounted sys file system on '/balena-takeover/sys'
2024-02-28 13:54:52 INFO Mounted dev file system on '/balena-takeover/dev'
2024-02-28 13:54:52 DEBUG [takeover::common::debug] After dev mount
2024-02-28 13:54:52 DEBUG [takeover::common::debug] check if /dev/loop-control exists in new /dev: true
2024-02-28 13:54:52 DEBUG [takeover::common::debug] open /dev/loop-control succeeded
2024-02-28 13:54:52 INFO Mounted devpts file system on '/balena-takeover/dev/pts'
2024-02-28 13:54:52 INFO Created directory '/balena-takeover/mnt/old_root'
2024-02-28 13:54:52 DEBUG [takeover::stage1::exe_copy] copy_file: copying '/lib/ld-linux-aarch64.so.1' to '/balena-takeover/lib/ld-linux-aarch64.so.1'
2024-02-28 13:54:52 DEBUG [takeover::stage1::exe_copy] copy_file: copying '/lib/aarch64-linux-gnu/libc.so.6' to '/balena-takeover/lib/aarch64-linux-gnu/libc.so.6'
2024-02-28 13:54:52 INFO Copied '/bin/dd' to '/balena-takeover/bin/dd'
2024-02-28 13:54:53 INFO Copied '/home/lmb/takeover' to '/balena-takeover/bin/takeover'
2024-02-28 13:54:53 INFO Copied config.json to '/home/lmb/config.Kpj6ZR7Hdf.json'
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::mount] from_mtab: processing mount Mount { mountpoint: "/", fs_type: "ext4" }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::mount] from_mtab: processing mount Mount { mountpoint: "/boot/firmware", fs_type: "vfat" }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::mount] from_mtab: processing mount Mount { mountpoint: "/mnt/pen", fs_type: "vfat" }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info] new: Root device number is: 179:2
2024-02-28 13:54:53 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "/dev/mmcblk0p1: LABEL_FATBOOT=\"bootfs\" LABEL=\"bootfs\" UUID=\"5DF9-E225\" BLOCK_SIZE=\"512\" TYPE=\"vfat\" PARTUUID=\"d0eeea92-01\"\n", stderr: "" }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p1 got param name: LABEL_FATBOOT, value: bootfs
2024-02-28 13:54:53 WARN [takeover::stage1::block_device_info::partition] unexpected parameter name found: 'LABEL_FATBOOT'
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p1 got param name: LABEL, value: bootfs
2024-02-28 13:54:53 WARN [takeover::stage1::block_device_info::partition] unexpected parameter name found: 'LABEL'
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p1 got param name: UUID, value: 5DF9-E225
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p1 got param name: BLOCK_SIZE, value: 512
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p1 got param name: TYPE, value: vfat
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p1 got param name: PARTUUID, value: d0eeea92-01
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: for /dev/mmcblk0p1 got PartitionInfo { uuid: Some("5DF9-E225"), block_size: Some(512), fs_type: Some("vfat"), label: None, part_uuid: Some("d0eeea92-01") }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info] found partition '"mmcblk0p1"' in '/sys/block/mmcblk0/mmcblk0p1'
2024-02-28 13:54:53 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "/dev/mmcblk0p2: LABEL=\"rootfs\" UUID=\"3b614a3f-4a65-4480-876a-8a998e01ac9b\" BLOCK_SIZE=\"4096\" TYPE=\"ext4\" PARTUUID=\"d0eeea92-02\"\n", stderr: "" }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p2 got param name: LABEL, value: rootfs
2024-02-28 13:54:53 WARN [takeover::stage1::block_device_info::partition] unexpected parameter name found: 'LABEL'
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p2 got param name: UUID, value: 3b614a3f-4a65-4480-876a-8a998e01ac9b
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p2 got param name: BLOCK_SIZE, value: 4096
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p2 got param name: TYPE, value: ext4
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/mmcblk0p2 got param name: PARTUUID, value: d0eeea92-02
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: for /dev/mmcblk0p2 got PartitionInfo { uuid: Some("3b614a3f-4a65-4480-876a-8a998e01ac9b"), block_size: Some(4096), fs_type: Some("ext4"), label: None, part_uuid: Some("d0eeea92-02") }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info] found partition '"mmcblk0p2"' in '/sys/block/mmcblk0/mmcblk0p2'
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info] new: got device: BlockDevice { name: "mmcblk0", device_num: DeviceNum { major: 179, minor: 0 }, mounted: None, parent: "None" }
2024-02-28 13:54:53 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "/dev/sda1: UUID=\"7607-9B6E\" BLOCK_SIZE=\"512\" TYPE=\"vfat\" PARTUUID=\"4e639091-01\"\n", stderr: "" }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/sda1 got param name: UUID, value: 7607-9B6E
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/sda1 got param name: BLOCK_SIZE, value: 512
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/sda1 got param name: TYPE, value: vfat
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: /dev/sda1 got param name: PARTUUID, value: 4e639091-01
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info::partition] PartitionInfo::new: for /dev/sda1 got PartitionInfo { uuid: Some("7607-9B6E"), block_size: Some(512), fs_type: Some("vfat"), label: None, part_uuid: Some("4e639091-01") }
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info] found partition '"sda1"' in '/sys/block/sda/sda1'
2024-02-28 13:54:53 DEBUG [takeover::stage1::block_device_info] new: got device: BlockDevice { name: "sda", device_num: DeviceNum { major: 8, minor: 0 }, mounted: None, parent: "None" }
2024-02-28 13:54:53 DEBUG [takeover::stage1] Stage 2 config:
---
log_dev:
dev_name: /dev/sda1
fs_type: vfat
log_level: DEBUG
flash_dev: /dev/mmcblk0
pretend: false
umount_parts:
- dev_name: /dev/mmcblk0p1
mountpoint: /boot/firmware
fs_type: vfat
- dev_name: /dev/mmcblk0p2
mountpoint: /
fs_type: ext4
work_dir: /home/lmb
image_path: /home/lmb/balena-cloud-raspberrypi3-64-5.0.8.img.gz
config_path: /home/lmb/config.Kpj6ZR7Hdf.json
backup_path: ~
tty: /dev/pts/0
2024-02-28 13:54:53 INFO Wrote stage2 config to '/balena-takeover/stage2-config.yml'
2024-02-28 13:54:53 DEBUG [takeover::stage1] Found telinit at '/usr/sbin/telinit'
2024-02-28 13:54:53 DEBUG [takeover::stage1] telinit is a symlink with canonical path '/usr/bin/systemctl'
2024-02-28 13:54:53 INFO Copied '/usr/bin/systemctl' to '/balena-takeover/bin/telinit'
2024-02-28 13:54:53 INFO Bind-mounted new init as '/balena-takeover/bin/takeover'
2024-02-28 13:54:53 DEBUG [takeover::stage1] calling '/balena-takeover/bin/telinit u'
2024-02-28 13:54:55 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "" }
2024-02-28 13:54:55 INFO Restarted init
2024-02-28 13:54:55 INFO Takeover initiated successfully, please wait for the device to be reflashed and reboot
stage2-init.log
:
2024-02-28 13:54:55 INFO [takeover::init] Init entered
2024-02-28 13:54:55 INFO [takeover::init] Init check pid success!
2024-02-28 13:54:55 INFO [takeover::init] Stage 2 config was read successfully
2024-02-28 13:54:55 INFO [takeover::init] Stage 2 closed 44 fd's
2024-02-28 13:54:55 WARN [takeover::init] Failed to unmount log device '/dev/sda1' from '/mnt/pen', error: EBUSY
2024-02-28 13:54:55 INFO [takeover::init] Now logging to '/balena-takeover/mnt/log/stage2-init.log' on '/dev/sda1'
2024-02-28 13:54:55 INFO [takeover::init] Stage 2 setup_log success!, ext_log: true
2024-02-28 13:54:57 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "" }
2024-02-28 13:54:57 DEBUG [takeover::common] call: output: Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "" }
2024-02-28 13:54:57 INFO [takeover::init] Stage 2 migrate worker spawned
stage2.log
:
2024-02-28 13:54:57 INFO [takeover::stage2] Stage 2 migrate_worker entered
2024-02-28 13:54:57 INFO [takeover::stage2] Stage 2 config was read successfully
2024-02-28 13:54:57 INFO [takeover::stage2] Set logfile to /mnt/log/stage2.log
2024-02-28 13:54:58 DEBUG [takeover::stage2] active processes:
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1 name: systemd executable: /mnt/old_root/usr/lib/systemd/systemd ppid: 0
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 2 name: '[kthreadd]' executable: - ppid: 0
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 3 name: '[rcu_gp]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 4 name: '[rcu_par_gp]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 5 name: '[slub_flushwq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 6 name: '[netns]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 9 name: '[kworker/u8:0-flush-8:0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 10 name: '[mm_percpu_wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 11 name: '[rcu_tasks_kthread]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 12 name: '[rcu_tasks_rude_kthread]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 13 name: '[rcu_tasks_trace_kthread]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 14 name: '[ksoftirqd/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 15 name: '[rcu_preempt]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 16 name: '[migration/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 17 name: '[cpuhp/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 18 name: '[cpuhp/1]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 19 name: '[migration/1]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 20 name: '[ksoftirqd/1]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 23 name: '[cpuhp/2]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 24 name: '[migration/2]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 25 name: '[ksoftirqd/2]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 28 name: '[cpuhp/3]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 29 name: '[migration/3]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 30 name: '[ksoftirqd/3]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 31 name: '[kworker/3:0-mm_percpu_wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 33 name: '[kdevtmpfs]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 34 name: '[inet_frag_wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 36 name: '[kauditd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 37 name: '[kworker/0:2-mm_percpu_wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 38 name: '[khungtaskd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 40 name: '[oom_reaper]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 41 name: '[writeback]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 42 name: '[kcompactd0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 43 name: '[kintegrityd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 44 name: '[kblockd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 45 name: '[blkcg_punt_bio]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 46 name: '[watchdogd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 47 name: '[kworker/u8:2-events_unbound]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 49 name: '[kworker/1:1H-mmc_complete]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 50 name: '[rpciod]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 51 name: '[xprtiod]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 53 name: '[kworker/2:1-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 55 name: '[kswapd0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 56 name: '[nfsiod]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 57 name: '[kthrotld]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 62 name: '[hwrng]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 63 name: '[iscsi_conn_clea]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 64 name: '[nvme-wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 65 name: '[nvme-reset-wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 66 name: '[nvme-delete-wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 67 name: '[dwc_otg]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 68 name: '[DWC Notificatio]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 69 name: '[uas]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 70 name: '[vchiq-slot/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 71 name: '[vchiq-recy/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 72 name: '[vchiq-sync/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 79 name: '[zswap-shrink]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 104 name: '[mmc_complete]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 107 name: '[kworker/2:1H-kblockd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 127 name: '[kworker/3:1H-mmc_complete]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 139 name: '[kworker/0:1H-kblockd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 140 name: '[scsi_eh_0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 141 name: '[scsi_tmf_0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 142 name: '[usb-storage]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 154 name: '[kworker/1:2H]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 173 name: '[kworker/2:2H]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 174 name: '[kworker/1:3-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 180 name: '[irq/184-usb-001:006:01]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 183 name: '[kworker/0:2H-kblockd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 186 name: '[jbd2/mmcblk0p2-8]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 187 name: '[ext4-rsv-conver]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 202 name: '[mld]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 203 name: '[ipv6_addrconf]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 246 name: systemd-journal executable: /mnt/old_root/usr/lib/systemd/systemd-journald ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 252 name: '[ext4lazyinit]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 253 name: '[kworker/2:3-events_freezable_power_]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 268 name: systemd-udevd executable: /usr/bin/udevadm ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 309 name: '[vchiq-keep/0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 310 name: '[SMIO]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 359 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 360 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 361 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 362 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 366 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 367 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 368 name: '[mmal-vchiq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 390 name: systemd-timesyn executable: /usr/lib/systemd/systemd-timesyncd ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 419 name: '[cfg80211]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 436 name: '[brcmf_wq/mmc1:0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 437 name: '[brcmf_wdog/mmc1:0001:1]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 443 name: '[kworker/u9:2-brcmf_wq/mmc1:0001:1]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 454 name: avahi-daemon executable: /mnt/old_root/usr/sbin/avahi-daemon ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 455 name: cron executable: /mnt/old_root/usr/sbin/cron ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 456 name: dbus-daemon executable: /mnt/old_root/usr/bin/dbus-daemon ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 465 name: avahi-daemon executable: /mnt/old_root/usr/sbin/avahi-daemon ppid: 454
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 472 name: polkitd executable: /mnt/old_root/usr/lib/polkit-1/polkitd ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 495 name: '[cec-vc4-hdmi]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 496 name: '[irq/82-vc4 hdmi cec]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 501 name: systemd-logind executable: /usr/lib/systemd/systemd-logind ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 506 name: '[card0-crtc0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 507 name: thd executable: /mnt/old_root/usr/sbin/thd ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 508 name: '[card0-crtc1]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 509 name: udisksd executable: /mnt/old_root/usr/libexec/udisks2/udisksd ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 510 name: '[card0-crtc2]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 511 name: '[card0-crtc3]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 535 name: ModemManager executable: /usr/sbin/ModemManager ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 537 name: NetworkManager executable: /usr/sbin/NetworkManager ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 538 name: wpa_supplicant executable: /mnt/old_root/usr/sbin/wpa_supplicant ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 661 name: '[kworker/1:4-mm_percpu_wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 690 name: bluetoothd executable: /usr/libexec/bluetooth/bluetoothd ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 736 name: '[kworker/u9:3-hci0]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1081 name: '[kworker/0:1-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1082 name: '[kworker/3:0H]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1083 name: '[kworker/1:0-cgroup_destroy]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1084 name: '[kworker/3:2-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1085 name: '[kworker/0:0H-kblockd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1209 name: '[kworker/u8:1-events_unbound]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1210 name: '[kworker/u8:3-ext4-rsv-conversion]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1282 name: login executable: /mnt/old_root/usr/bin/login ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1335 name: '[kworker/1:1-mm_percpu_wq]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1336 name: '[kworker/2:0-cgroup_destroy]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1342 name: systemd executable: /mnt/old_root/usr/lib/systemd/systemd ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1343 name: '[kworker/2:2-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1344 name: (sd-pam) executable: /mnt/old_root/usr/lib/systemd/systemd ppid: 1342
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1359 name: bash executable: /mnt/old_root/usr/bin/bash ppid: 1282
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1375 name: '[kworker/2:0H]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1411 name: '[kworker/1:0H-kblockd]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1416 name: sudo executable: /mnt/old_root/usr/bin/sudo ppid: 1359
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1417 name: sudo executable: /mnt/old_root/usr/bin/sudo ppid: 1416
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1418 name: bash executable: /mnt/old_root/usr/bin/bash ppid: 1417
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1420 name: takeover executable: /mnt/old_root/home/lmb/takeover ppid: 1418
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1421 name: tee executable: /mnt/old_root/usr/bin/tee ppid: 1420
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1426 name: '[kworker/3:1-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1427 name: '[kworker/3:3-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1429 name: '[kworker/3:2H]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1459 name: '[kworker/0:0-events]' executable: - ppid: 2
2024-02-28 13:54:58 DEBUG [takeover::stage2] pid: 1463 name: takeover executable: /bin/takeover ppid: 1
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 1, fd 0, file: '/proc/1/fd/0' -> 'pipe:[16241]'
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 1, fd 1, file: '/proc/1/fd/1' -> '/stdout.log'
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 1, fd 2, file: '/proc/1/fd/2' -> '/stderr.log'
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 1, fd 3, file: '/proc/1/fd/3' -> '/mnt/log/stage2-init.log'
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 1, fd 4, file: '/proc/1/fd/4' -> 'pipe:[16241]'
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 246, fd 0, file: '/proc/246/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 246
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 268, fd 0, file: '/proc/268/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 268
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 390, fd 0, file: '/proc/390/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 390
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 454, fd 0, file: '/proc/454/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 454
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 455, fd 0, file: '/proc/455/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 455
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 456, fd 0, file: '/proc/456/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 456
2024-02-28 13:54:58 DEBUG [takeover::common::system] looking at pid: 465, fd 0, file: '/proc/465/fd/0' -> '/mnt/old_root/dev/null'
2024-02-28 13:54:58 DEBUG [takeover::common::system] sending signal 15 to 465
2024-02-28 13:54:58 ERROR [takeover::stage2] kill_procs failed, error An error occurred in an upstream function, context: Failed to read link '/proc/472/fd/0'
caused by: No such file or directory (os error 2)
Maybe a race condition? The process in question had the file descriptor (FD) open when we enumerated the list of open FDs, but it's no longer open when we try to access or use it?
If so, I believe this would be harmless (in the sense that it doesn't damage the system in any way, and a future migration attempt will probably work fine). But, yeah, annoying.
I encountered this today as well on a balenaOS to balenaOS test migration
2024-03-21 12:59:03 INFO [takeover::stage2] Stage 2 migrate_worker entered
2024-03-21 12:59:03 INFO [takeover::stage2] Stage 2 config was read successfully
2024-03-21 12:59:03 INFO [takeover::stage2] Set logfile to /mnt/log/stage2.log
2024-03-21 12:59:03 TRACE [takeover::stage2] kill_procs: entered
2024-03-21 12:59:03 DEBUG [takeover::stage2] active processes:
2024-03-21 12:59:03 ERROR [takeover::stage2] kill_procs failed, error An error occurred in an upstream function, context: Failed to read link to executable: '/proc/7067/exe'
caused by: No such file or directory (os error 2)
2024-03-21 12:59:03 TRACE [takeover::stage2] reboot entered
I have been looking into this, and unfortunately there are several race conditions around the fuser()
function, all of which can trigger errors at different points. Here are the ones I identified:
read_dir(&fd_dir)
to get the list of files opened by a given process, the process exists. But when we call lstat(curr_path.as_path())
this process has already died. This will trigger a "proper" lstat()
error, which may be what Ken originally saw. This is currently handled by our code that ignores lstat
errors (continue after lstat error
).read_dir(&fd_dir)
, the process has a given file open, but when we call lstat(curr_path.as_path())
, the process has closed that file. That would be another possible cause for a "proper" lstat
error, like the one Ken originally saw.lstat
. But then, the file is closed (or the whole process is killed) just after this lstat
call, which then causes the read_link
call to fail. This seems to be what happened to me when I first opened this issue.And now for something completely different. The function get_process_infos
(indirectly called just before we call fuser
also has a race:
ProcessIterator::new()
(which boils down to a read_dir
on /proc
), then iterates over the results calling get_process_info_for
for each entry. If some process is killed before its entry is processed, get_process_info_for
will call read_link
on a path that doesn't exist anymore and fail. This appears to be what Alexandru saw.For future reference, here's a patch I applied on top of current master (1ac8dd08c12369a184c953e34ac5e0883f634652), to reproduce the case 1 from my list above.
Now, what do we do about all that? Need to think a bit more, but currently what I have in mind is this:
/proc
changed between two "critical" points. Specifically, some entries under /proc
were removed to reflect changes to the system that happened after the read_dir()
.lstat()
errors isn't ideal, because it may end up ignoring some other error that is not related to the race....which makes me think of the opposite kind of races that go on undetected: what if processes are created or existing processes open files between the critical points?
As far as I understand, all these racy process iteration we do is to put the system on a state that is as idle as possible (no unnecessary processes running, no unnecessary files open) in order to minimize the chances of flashing errors. But generally speaking, I don't think that a surviving process would cause the migration to fail, so I want to believe we can live with that.
(Still, I said "generally speaking". I can imagine that a surviving process that is allocating huge amounts of memory, or maybe also trying to overwrite the storage device could cause problems. We never saw anything like that in testing, and as far as I know we never had reports of this kind. I'd expect these cases to be beyond rare.)
Case 4 (the one reported by Alexandru) is actually puzzling me. The error message he saw was
2024-03-21 12:59:03 ERROR [takeover::stage2] kill_procs failed, error An error occurred in an upstream function, context: Failed to read link to executable: '/proc/7067/exe'
caused by: No such file or directory (os error 2)
AFAICT, the only place this can possible originate from is here, and because read_link()
returned ENOENT
(AKA "os error 2"). However, if look at the code, this case seems to be handled already (for the /proc/PID/exe
case, not for /proc/PID/root
).
The relevant code hasn't changed in a while, so it seems I am looking at the same code that Alexandru ran. I don't understand how he he could possibly see that error message.
[UPDATE] 23 hours later...
Now I understand. The original code had copy-and-pasted ("copied-and-pasted"?) the error handling for the "exe" case into the "root" case. So, the error reporting was wrong and confused me. It was the "root" case failing for Alexandru all the time.
Knots being disentangled!
And I found another race condition:
get_process_infos()
will use ProcessIterator::new()
to get the list of running processes, and then (via parse_status()
) call get_process_info_for()
for each entry in. This will in turn read /proc/PID/status
, which may be vanished by then.hey @lmbarros this was handled by #75 right?
Solved by #75
Stage2 logs end like this:
And a little earlier we can see PID 472 refers to
polkitd
: