dm-vdo / vdo-devel

Primary VDO mainline development repository
GNU General Public License v2.0
5 stars 11 forks source link

Unable to mount volume & input/output errors after lvextend VDO volume #4

Open tigerblue77 opened 1 year ago

tigerblue77 commented 1 year ago

Hello there, I think I've made a big mistake which I'd like to fix it as best I can and I hope I won't lose my data...

On a Debian 11 bare-metal, I did a lvextend of a VDO pool in parallel with a lvextend of the LV it contains and now my volume :

/dev/Ultron-vg/ZEROED-VDO-LV-1: ** WARNING: Filesystem still has errors **

- when I `lvextend` again, I get the following output:
```bash
lvextend --resizefs -L20.4T /dev/Ultron-vg/ZEROED-VDO-POOL-1
  Ignoring --resizefs as volume Ultron-vg/ZEROED-VDO-POOL-1 does not have a filesystem.
  Rounding size to boundary between physical extents: 20,40 TiB.
  Increasing incremention size from 0    to 8,00 GiB to fit new VDO slab.
  Size of logical volume Ultron-vg/ZEROED-VDO-POOL-1_vdata changed from 20,40 TiB (5347738 extents) to <20,41 TiB (5349786 extents).
  device-mapper: reload ioctl on  (254:4) failed: Input/output error
  Failed to suspend logical volume Ultron-vg/ZEROED-VDO-LV-1.

I think the problem comes from VDOdataPool but I don't even know how to interact with it... The underlying storage is OK and every other LV in this PV and VG are okay, healthy and working well.

vdostats --human-readable

gives me the following output :

Device                                                       Size      Used Available Use% Space saving%
Ultron--vg-ZEROED--VDO--POOL--1-vpool                       20.4T     17.0G     20.4T   0%          100%
Ultron--vg-COMPRESSED--DEDUPLICATED--VDO--POOL--1-vpool      1.0T    917.6G    106.4G  90%           14%

But I should have 20.2T used, not 17.0G 😭

Please tell me I won't lose my data 😭 Yes it's my choice not to backup it and it's "only" movies and STORJ data but I would be very happy if I could recover it (or copy it elsewhere before destroying this VDO volume)

tigerblue77 commented 1 year ago

After hours of reading and tests, I found out that executing the following commands seems to fix the problem, at least temporarily :

vdorecover /dev/mapper/Ultron--vg-ZEROED--VDO--POOL--1-vpool
vdoregenerategeometry /dev/mapper/Ultron--vg-ZEROED--VDO--POOL--1_vdata
vdoforcerebuild /dev/mapper/Ultron--vg-ZEROED--VDO--POOL--1_vdata
e2fsck -fy /dev/mapper/Ultron--vg-ZEROED--VDO--LV--1

After running them, I am able to mount the volume in read/write mode but after a dozen minutes writing to the volume, I get the following errors and the volume re switches to read-only mode... Nothing happens if I don't write to the volume, my log indicates a corrupt or incorrect page :

Jun 11 14:29:58 Ultron kernel: [16989.789751] kvdo6:logQ4: Expected page 5450350161 but got page 2205289393 instead: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.789811] kvdo6:logQ4: Completing read vio of type 3 for physical block 5450350161 with error: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.789866] kvdo6:journalQ: Unrecoverable error, entering read-only mode: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790027] kvdo6:hashQ0: Completing write vio for LBN 5776241801 with error after get_mapped_block/for_dedupe: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790123] kvdo6:hashQ0: Completing write vio for LBN 5776241796 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790128] kvdo6:hashQ3: Completing write vio for LBN 5776241799 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790138] kvdo6:hashQ1: Completing write vio for LBN 5776241797 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790142] kvdo6:hashQ4: Completing write vio for LBN 5776241792 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790147] kvdo6:hashQ4: Completing write vio for LBN 5776241794 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790150] kvdo6:hashQ1: Completing write vio for LBN 5776241798 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790152] kvdo6:hashQ4: Completing write vio for LBN 5776241795 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790156] kvdo6:hashQ2: Completing write vio for LBN 5776241800 with error after write_data_vio: VDO Status: Corrupt or incorrect page (1473)
Jun 11 14:29:58 Ultron kernel: [16989.790159] kvdo6:cpuQ8: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790162] kvdo6:cpuQ8: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790164] kvdo6:cpuQ8: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790165] kvdo6:cpuQ8: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790166] kvdo6:cpuQ8: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790167] kvdo6:cpuQ8: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790189] kvdo6:cpuQ9: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790487] kvdo6:cpuQ1: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790513] kvdo6:cpuQ2: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790516] kvdo6:cpuQ2: vdo_map_to_system_error: mapping internal status code 1473 (VDO_BAD_PAGE: VDO Status: Corrupt or incorrect page) to EIO
Jun 11 14:29:58 Ultron kernel: [16989.790656] Aborting journal on device dm-5-8.
Jun 11 14:29:58 Ultron kernel: [16989.790729] buffer_io_error: 490 callbacks suppressed
Jun 11 14:29:58 Ultron kernel: [16989.790732] Buffer I/O error on dev dm-5, logical block 3178496, lost sync page write
Jun 11 14:29:58 Ultron kernel: [16989.790799] JBD2: Error -5 detected when updating journal superblock for dm-5-8.
Jun 11 14:29:58 Ultron kernel: [16989.790889] EXT4-fs error (device dm-5) in __ext4_unlink:3313: IO failure
Jun 11 14:29:58 Ultron kernel: [16989.790974] Buffer I/O error on dev dm-5, logical block 0, lost sync page write
Jun 11 14:29:58 Ultron kernel: [16989.791032] EXT4-fs (dm-5): I/O error while writing superblock
Jun 11 14:29:58 Ultron kernel: [16989.791064] EXT4-fs (dm-5): previous I/O error to superblock detected
Jun 11 14:29:58 Ultron kernel: [16989.791092] Buffer I/O error on dev dm-5, logical block 0, lost sync page write
Jun 11 14:29:58 Ultron kernel: [16989.791139] EXT4-fs (dm-5): previous I/O error to superblock detected
Jun 11 14:29:58 Ultron kernel: [16989.791197] Buffer I/O error on dev dm-5, logical block 0, lost sync page write
Jun 11 14:29:58 Ultron kernel: [16989.791242] EXT4-fs (dm-5): I/O error while writing superblock
Jun 11 14:29:58 Ultron kernel: [16989.791250] EXT4-fs (dm-5): I/O error while writing superblock
Jun 11 14:29:58 Ultron kernel: [16989.791285] EXT4-fs error (device dm-5): ext4_journal_check_start:83: Detected aborted journal
Jun 11 14:29:58 Ultron kernel: [16989.791311] EXT4-fs error (device dm-5): ext4_journal_check_start:83: Detected aborted journal
Jun 11 14:29:58 Ultron kernel: [16989.791364] EXT4-fs (dm-5): Remounting filesystem read-only
Jun 11 14:29:58 Ultron kernel: [16989.791397] EXT4-fs (dm-5): Remounting filesystem read-only
Jun 11 14:30:07 Ultron kernel: [16998.661494] Buffer I/O error on dev dm-5, logical block 3, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661572] Buffer I/O error on dev dm-5, logical block 7, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661612] Buffer I/O error on dev dm-5, logical block 12, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661644] Buffer I/O error on dev dm-5, logical block 16, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661684] Buffer I/O error on dev dm-5, logical block 17, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661725] Buffer I/O error on dev dm-5, logical block 18, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661767] Buffer I/O error on dev dm-5, logical block 19, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661809] Buffer I/O error on dev dm-5, logical block 20, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661860] Buffer I/O error on dev dm-5, logical block 23, lost async page write
Jun 11 14:30:07 Ultron kernel: [16998.661909] Buffer I/O error on dev dm-5, logical block 24, lost async page write

I will now backup all data in a new non-VDO logical volume for safety, I hope I won't backup corrupted data...

Please don't consider this as solved, I would like to understand what's happening and maybe fix some underlying issues that brought me to this mess...