archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
538 stars 77 forks source link

"error updating dirty block! Bad address" and segfault in v2.1.2 #224

Closed mattchrist closed 5 months ago

mattchrist commented 5 months ago

Hello! I'm experiencing issues with release 2.1.2 of s3backer. I'm using an s3backer-backed file as a zfs pool, and replicating local datasets to the s3backer pool using syncoid.

Since updating from 2.0.2 to 2.1.2, I get errors and segfaults. I've downgraded to 2.0.2 and the issues no longer occur.

Log entries:

Jun 06 07:13:54 t02 systemd[1]: Started s3backer-backup.service.
Jun 06 07:13:55 t02 s3backer-backup-start[1415652]: s3backer: auto-detecting block size and total file size...
Jun 06 07:13:55 t02 s3backer-backup-start[1415652]: s3backer: auto-detected block size=1m and total size=10t
Jun 06 07:13:55 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:55 NOTICE: creating new cache file `/var/lib/s3backer/cache/backup' with capacity 10240 blocks
Jun 06 07:13:55 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:55 INFO: reading meta-data from cache file `/var/lib/s3backer/cache/backup'
Jun 06 07:13:55 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:55 INFO: loaded cache file `/var/lib/s3backer/cache/backup' with 10240 free and 0 used blocks (max index 0)
Jun 06 07:13:56 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:56 INFO: established new mount token 0x2a5b313b
Jun 06 07:13:56 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:56 INFO: s3backer process 1415652 for /var/lib/s3backer/backup started
Jun 06 07:13:56 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:56 INFO: starting non-zero block survey
Jun 06 07:13:56 t02 s3backer-backup-start[1415652]: 2024-06-06 07:13:56 INFO: mounting /var/lib/s3backer/backup
Jun 06 07:14:48 t02 s3backer-backup-start[1415652]: 2024-06-06 07:14:48 INFO: non-zero block survey completed (887547 non-zero blocks reported)
Jun 06 07:15:38 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:38 ERROR: operation failed: Failed to open/read local data from file/application (will retry)
Jun 06 07:15:38 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:38 INFO: retrying query (attempt #2): PUT https://s3.us-west-004.backblazeb2.com/redacted-bucket/00154007
Jun 06 07:15:44 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:44 ERROR: operation failed: Failed to open/read local data from file/application (will retry)
Jun 06 07:15:45 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:45 INFO: retrying query (attempt #2): PUT https://s3.us-west-004.backblazeb2.com/redacted-bucket/000cc007
Jun 06 07:15:54 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:54 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 344981504: Bad address
Jun 06 07:15:54 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:54 ERROR: error updating dirty block! Bad address
Jun 06 07:15:54 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:54 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 345243648: Bad address
Jun 06 07:15:54 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:54 ERROR: error updating dirty block! Bad address
Jun 06 07:15:54 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:54 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 345374720: Bad address
Jun 06 07:15:54 t02 s3backer-backup-start[1415652]: 2024-06-06 07:15:54 ERROR: error updating dirty block! Bad address
Jun 06 07:17:16 t02 s3backer-backup-start[1415652]: 2024-06-06 07:17:16 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 2490368: Bad address
Jun 06 07:17:16 t02 s3backer-backup-start[1415652]: 2024-06-06 07:17:16 ERROR: error updating dirty block! Bad address
Jun 06 07:18:02 t02 s3backer-backup-start[1415652]: 2024-06-06 07:18:02 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 335151104: Bad address
Jun 06 07:18:02 t02 s3backer-backup-start[1415652]: 2024-06-06 07:18:02 ERROR: error updating dirty block! Bad address
Jun 06 07:18:27 t02 s3backer-backup-start[1415652]: 2024-06-06 07:18:27 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 8388608: Bad address
Jun 06 07:18:27 t02 s3backer-backup-start[1415652]: 2024-06-06 07:18:27 ERROR: error updating dirty block! Bad address
Jun 06 07:18:51 t02 s3backer-backup-start[1415652]: 2024-06-06 07:18:51 ERROR: error writing cache file `/var/lib/s3backer/cache/backup' at offset 182976512: Bad address
Jun 06 07:18:51 t02 s3backer-backup-start[1415652]: 2024-06-06 07:18:51 ERROR: error updating dirty block! Bad address
Jun 06 07:18:54 t02 s3backer-backup-start[1415640]: /nix/store/z4lnd3mdcbgi1yc10adz8m3mhsa7hg8m-unit-script-s3backer-backup-start/bin/s3backer-backup-start: line 3: 1415652 Segmentation fault      (core dumped) s3backer -f --accessFile=/run/secrets/s3backer-access-file --encrypt --passwordFile=/run/secrets/s3backer-password-file --size=10T --filename=file --configFile=/nix/store/x1gciqnvr85wbwzxpr59pga1v4jdkdd1-s3backer-backup-extra.conf redacted-bucket /var/lib/s3backer/backup
Jun 06 07:18:54 t02 systemd[1]: s3backer-backup.service: Main process exited, code=exited, status=139/n/a

configFile:

--baseURL=https://s3.us-west-004.backblazeb2.com/
--blockSize=1M
--blockCacheSize=10240
--blockCacheFile=/var/lib/s3backer/cache/backup
--blockCacheRecoverDirtyBlocks
--listBlocks

I'll try and reproduce/bisect in a more detailed way later, but figured it was important to get an issue started.

archiecobbs commented 5 months ago

Haven't seen this one before.

Just to double check: verify that /var/lib/s3backer/cache/backup is not somehow symlinked into /var/lib/s3backer/backup and getting covered up by the mount.

Random thing to try: edit dcache.c line 160 and change USE_FALLOCATE from 1 to 0.

mattchrist commented 5 months ago

Thanks for the response! I can confirm that /var/lib/s3backer/cache/backup is not symlinked into /var/lib/s3backer/backup.

I plan to replicate the issue with unimportant data later today, if i can I'll try with your dcache.c recommendation and report back.

mattchrist commented 5 months ago

Ok, I'm able to reliably recreate the segfault issue.

Steps to reproduce:

  1. Create a s3backer-backed file
  2. Create a zfs pool on that file and mount it somewhere
  3. Create a file on the zfs dataset
  4. Delete the file from the zfs dataset
  5. run zpool trim on the pool
  6. Get a segfault

I wasn't able to reproduce with --test, so the segfault issue may also rely on having dirty blocks when the trim is issued.

Opening the coredump with gdb:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  s3b_dcache_count_zero_fs_blocks (priv=<optimized out>, len=774144, 
    src=0x10000 <error: Cannot access memory at address 0x10000>) at dcache.c:687
687                 if (word_ptr[i] != 0)
[Current thread is 1 (Thread 0x7f145edd06c0 (LWP 1610003))]

I can also confirm that the issue does not occur with USE_FALLOCATE=0.

archiecobbs commented 5 months ago

Thanks for the detective work. I'll take a look.

archiecobbs commented 5 months ago

Should be fixed in 5cfc652. Thanks for the hint!

khumarahn commented 5 months ago

@archiecobbs Hello! Any chance you can make a new minor release? I don't mean to pester, and would fully understand if this is not in your plans. But a couple of potentially important bugs have been fixed, and building depending on ndb got better, and a new release would simplify packaging :-)

archiecobbs commented 5 months ago

Sure, no problem - done.

khumarahn commented 5 months ago

great, thank you!