chenxiaolong / avbroot

Sign (and root) Android A/B OTAs with custom keys while preserving Android Verified Boot
GNU General Public License v3.0
504 stars 42 forks source link

update_engine fails to flash system partition that increased in size #306

Closed chenxiaolong closed 3 months ago

chenxiaolong commented 3 months ago

On my Pixel 8 Pro, when modifying system.img in a way that causes the size to increase (1282695168 -> 1286025216 bytes = approx +3.2 MiB), update_engine fails to flash the partition.

I don't think the No space left on COW device is an actual disk space issue. ~These system images are way smaller than the stock OS system images.~ (EDIT: Not true. GrapheneOS' system.img is actually larger.) There might be a payload.bin manifest field that avbroot is failing to update.

2024-06-21 18:27:28.588016 -0400 I/update_engine( 1217): [INFO:partition_writer_factory_android.cc(36)] Virtual AB Compression Enabled, using VABC Partition Writer for `system`
2024-06-21 18:27:28.592103 -0400 I/update_engine( 1217): [INFO:vabc_partition_writer.cc(96)] Partition `system` has 0 copy blocks
2024-06-21 18:27:28.593274 -0400 I/update_engine( 1217): [INFO:vabc_partition_writer.cc(214)] Device supports Virtual AB compression with XOR, but OTA package does not.
2024-06-21 18:27:28.605472 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot system_b
2024-06-21 18:27:28.648232 -0400 I/update_engine( 1217): [INFO:fs_mgr_dm_linear.cpp(247)] [libfs_mgr] Created logical partition system_b-base on device /dev/block/dm-8
2024-06-21 18:27:28.659178 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2622)] Mapped COW device for system_b at /dev/block/dm-9
2024-06-21 18:27:28.664249 -0400 I/update_engine( 1217): [INFO:writer_base.cpp(83)] COW image /dev/block/dm-9 has size 837435392
2024-06-21 18:27:28.693624 -0400 I/update_engine( 1217): [INFO:writer_v2.cpp(177)] Batch writes: enabled
2024-06-21 18:27:28.706067 -0400 I/update_engine( 1217): [INFO:writer_v2.cpp(182)] Not creating new threads for compression.
2024-06-21 18:27:44.949698 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 508/1692 operations (30%), 396787712/1271494038 bytes downloaded (31%), overall progress 30%
2024-06-21 18:28:15.000048 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 634/1692 operations (37%), 487735296/1271494038 bytes downloaded (38%), overall progress 37%
2024-06-21 18:28:25.092334 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 677/1692 operations (40%), 519847936/1271494038 bytes downloaded (40%), overall progress 40%
2024-06-21 18:28:55.362396 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 781/1692 operations (46%), 618414080/1271494038 bytes downloaded (48%), overall progress 47%
2024-06-21 18:29:08.961554 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 833/1692 operations (49%), 661192704/1271494038 bytes downloaded (52%), overall progress 50%
2024-06-21 18:29:39.165788 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 963/1692 operations (56%), 745242624/1271494038 bytes downloaded (58%), overall progress 57%
2024-06-21 18:29:52.619500 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(108)] Completed 1016/1692 operations (60%), 785121280/1271494038 bytes downloaded (61%), overall progress 60%
2024-06-21 18:29:59.819830 -0400 E/update_engine( 1217): [ERROR:writer_v2.cpp(601)] No space left on COW device. Required: 837438955, available: 837435392
2024-06-21 18:29:59.824215 -0400 E/update_engine( 1217): [ERROR:writer_v2.cpp(428)] AddRawBlocks: write failed: No space left on device
2024-06-21 18:29:59.825746 -0400 E/update_engine( 1217): [ERROR:block_extent_writer.cc(94)] WriteExtent(313344, 512) failed.
2024-06-21 18:29:59.829543 -0400 E/update_engine( 1217): [ERROR:block_extent_writer.cc(116)] bytes_written > 0 failed.
2024-06-21 18:29:59.830779 -0400 E/update_engine( 1217): [ERROR:xz_extent_writer.cc(100)] underlying_writer_->Write(output_buffer.data(), request.out_pos) failed.
2024-06-21 18:29:59.831650 -0400 E/update_engine( 1217): [ERROR:install_operation_executor.cc(192)] writer->Write(data, operation.data_length()) failed.
2024-06-21 18:29:59.833430 -0400 E/update_engine( 1217): [ERROR:delta_performer.cc(960)] partition_writer_->PerformReplaceOperation( operation, buffer_.data(), buffer_.size()) failed.
2024-06-21 18:29:59.834815 -0400 E/update_engine( 1217): [ERROR:delta_performer.cc(194)] Failed to perform REPLACE_XZ operation 1050, which is the operation 612 in partition "system"
2024-06-21 18:29:59.835827 -0400 E/update_engine( 1217): [ERROR:delta_performer.cc(506)] unable to process operation: 28
2024-06-21 18:29:59.836738 -0400 E/update_engine( 1217): [ERROR:download_action.cc(227)] Error ErrorCode::kDownloadOperationExecutionError (28) in DeltaPerformer's Write method when processing the received payload -- Terminating processing
2024-06-21 18:29:59.943275 -0400 I/update_engine( 1217): [INFO:vabc_partition_writer.cc(416)] Finalizing system COW image
2024-06-21 18:30:00.397848 -0400 I/update_engine( 1217): [INFO:delta_performer.cc(213)] Discarding 2097312 unused downloaded bytes
2024-06-21 18:30:00.422467 -0400 I/update_engine( 1217): [INFO:multi_range_http_fetcher.cc(175)] Received transfer terminated.
2024-06-21 18:30:00.425526 -0400 I/update_engine( 1217): [INFO:multi_range_http_fetcher.cc(127)] TransferEnded w/ code 200
2024-06-21 18:30:00.426952 -0400 I/update_engine( 1217): [INFO:multi_range_http_fetcher.cc(129)] Terminating.
2024-06-21 18:30:00.428152 -0400 I/update_engine( 1217): [INFO:action_processor.cc(116)] ActionProcessor: finished DownloadAction with code ErrorCode::kDownloadOperationExecutionError
2024-06-21 18:30:00.429310 -0400 I/update_engine( 1217): [INFO:action_processor.cc(121)] ActionProcessor: Aborting processing due to failure.
2024-06-21 18:30:00.433801 -0400 I/update_engine( 1217): [INFO:update_attempter_android.cc(694)] Processing Done.
2024-06-21 18:30:00.557161 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot product_b
2024-06-21 18:30:00.625301 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot system_b
2024-06-21 18:30:00.628896 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot system_dlkm_b
2024-06-21 18:30:00.630525 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot vendor_dlkm_b
2024-06-21 18:30:00.634941 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot system_ext_b
2024-06-21 18:30:00.637131 -0400 I/update_engine( 1217): [INFO:snapshot.cpp(2551)] Successfully unmapped snapshot vendor_b
2024-06-21 18:30:00.645086 -0400 I/update_engine( 1217): [INFO:metrics_reporter_android.cc(159)] Current update attempt downloads 783 bytes data
chenxiaolong commented 3 months ago

avbroot does not update PartitionUpdate.estimate_cow_size. It seems like update_engine uses this estimate and adds a bit of buffer to form the hard limit.

From update_engine logs:

2024-06-21 18:29:59.819830 -0400 E/update_engine( 1217): [ERROR:writer_v2.cpp(601)] No space left on COW device. Required: 837438955, available: 837435392

From payload.bin's header entry for the system partition:

                estimate_cow_size: Some(
                    835335997,
                ),
chenxiaolong commented 3 months ago

Yep, that's exactly what it is: https://android.googlesource.com/platform/system/core/+/refs/tags/android-14.0.0_r51/fs_mgr/libsnapshot/partition_cow_creator.cpp#165

Just need to figure out how AOSP computes this value when generating payload.bin now.

chenxiaolong commented 3 months ago

OK, so there's CoW version 2 and 3 in AOSP.

The v2 algorithm is:

estimate = 0

for each block in partition:
    allocate buffer of size `LZ4_compressBound()`
    compress block with default settings (`LZ4_compress_default()`)
    if compressed size < uncompressed size:
        estimate += compressed size
    else:
        estimate += uncompressed size

The v3 algorithm is more complicated and I didn't bother figuring it out. Every OTA I've seen uses v2 + lz4, including the Android 15 betas. We can revisit this later if that changes.

chenxiaolong commented 3 months ago

This should be easy enough to implement. It can be snuck into avbroot::format::payload::compress_image(). We already parallelize other compression operations there based on a multiple of the block size.

The only caveat is that avbroot uses the Rust lz4_flex library, which is a complete reimplementation of lz4, while AOSP's libsnapshot_cow uses the original lz4. The compression ratios should be very close though, so hopefully that won't matter.

chenxiaolong commented 3 months ago

The performance impact on the patching process seems to be pretty negligible.

Non-scientific test (husky_beta-ota-ap31.240517.022-8f9fd0f3.zip with --replace system system.img on an Intel i9-9900KS):

chenxiaolong commented 3 months ago

The only caveat is that avbroot uses the Rust lz4_flex library, which is a complete reimplementation of lz4, while AOSP's libsnapshot_cow uses the original lz4. The compression ratios should be very close though, so hopefully that won't matter.

Time to eat my words. lz4_flex compresses system.img better than lz4. I guess I'll have avbroot fudge the numbers by 1% to account for that. This will only waste a few megabytes of space during OTA flashing. That'll get returned to the user after rebooting and the CoW snapshots have been merged. And on Pixels, it shouldn't waste any space. The super partition is a huge 8 GiB, so the CoW snapshots should never spill over into the userdata partition.