Feature request: Raidz support, and file version roll back

JamesRBrown commented 5 years ago

Very impressive effort! I hope you continue to develop this it would be a huge value to the community.

Two things I'd like to see are:

Raidz support, as this is one of the main reasons people use ZFS (for instance my pool is a Raidz3 pool).
The ability to roll back to an earlier version of a file. Something I think should be technically possible, since it's a copy of write filesystem.

Stefan311 commented 5 years ago

Thanks for interest. But I have suspended this project because of nearly no feedback from someone. I have written this project to recover my own data, and of course for curiosity.

Raidz support is not included because I can't find any design documentation. The only explanation of the ZFS data format I found is from the very early days of ZFS, and Raidz is not part of this. Getting this information by reverse engineering and trying out will be very time consuming and error prone. I have asked for help in the FreeBSD filesystem mailinglist, but I got not a single reply.
Roll back an earlier version of a file is already possible. Just chose an older Uberblock as base for your data exploration. But keep in mind, copy on write is no guarantee for getting older data. Copy on write means new data will not overwrite older data as long the older data is still current. After the new data is finally written and become current, the old data is marked as free space and can be overwritten by something else.

JamesRBrown commented 5 years ago

Thank you for your very kind reply.

I realize COW is no guarantee. Would you know how to rollback to a state twelve to twenty four hours old? I've been doing my own research on the matter looking into commands with zdb, but if you could help direct my efforts it would be much appreciated. I unfortunately fell victim to a ransomware virus. Looking at the FreeNAS logs, a long with my free space, I'm pretty sure the data is still there uncorrupted, if I can manage to roll the txg back 11+ hours.

The best information I found was in the Google groups thread:

https://groups.google.com/forum/#!msg/mailing.freebsd.hackers/QzQA32BmaqQ/AoLEZrfkElMJ

Am 12.07.2013

14:33, schrieb Volodymyr Kostyrko:

You can try to experiment with zpool hidden flags. Look at this command:

zpool import -N -o readonly=on -f -R /pool

It will try to import pool in readonly mode so no data would be written on it. It also doesn't mount anything on import so if any fs is damaged you have less chances triggering a coredump. Also zpool import has a hidden -T switch that gives you ability to select transaction that you want to try to restore. You'll need a list of available transaction though:

zdb -ul

This one when given a vdev lists all uberblocks with their respective transaction ids. You can take the highest one (it's not the last one) and try to mount pool with:

zpool import -N -o readonly=on -f -R /pool -F -T

I had good luck with ZFS recovery with the following approach:

1) Use zdb to identify a TXG for which the data structures are intact

2) Select recovery mode by loading the ZFS KLD with "vfs.zfs.recover=1" set in /boot/loader.conf

3) Import the pool with the above -T option referring to a suitable TXG found with the help zdb.

The zdb commands to use are:

zdb -AAA -L -t -bcdmu

(Both -AAA and -L reduce the amount of consistency checking performed. A pool (at TXG) that needs these options to allow zdb to succeed is damaged, but may still allow recovery of most or all files. Be sure to only import that pool R/O, or your data will probably be lost!)

A list of TXGs to try can be retrieved with "zdb -hh ".

You may need to add "-e" to the list of zdb options, since the port is exported / not currently mounted).

Regards, STefan

I have new drives on the way, which I plan to do a offline binary clone to before I attempt any of this, so I'm not risking my data.

Stefan311 commented 5 years ago

the advise from Volodymyr Kostyrko seems pretty good.

find out the transaction id close before the data corruption. If all transactions are newer you are lost, but I think there is a good chance to get one older. Let me explain how this transaction/uberblock thing works: There is a list of 128 uberblocks. Every uberblock contains a pointer to a filesystem state (called transaction). So if you change 128 files, and the write operation is not merged to less operations, there are 128 transactions. Sounds like the older transactions are lost? Lucky not! Transactions does not use the uberblock spots one-by-one, the use the spots random. So, take a look into the uberblock list: zdb -ul /dev/ada1 (replace /dev/ada1 by one of your disk names) this throws out a long list of uberblocks like this:

Uberblock[111] magic = 0000000000bab10c version = 5000 txg = 15471 guid_sum = 2302981454941978704 timestamp = 1556958717 UTC = Sat May 4 10:31:57 2019 Uberblock[112] magic = 0000000000bab10c version = 5000 txg = 15344 guid_sum = 2302981454941978704 timestamp = 1556039375 UTC = Tue Apr 23 19:09:35 2019 Uberblock[113] magic = 0000000000bab10c version = 5000 txg = 15473 guid_sum = 2302981454941978704 timestamp = 1556958717 UTC = Sat May 4 10:31:57 2019

You see a timestamp and the transaction (txg) number. ZDB displays all 4 copies of the uberblock list. It's ok to just visit one of this. Have you found a txg? Good! Now follow Volodymyr's advice: zpool import -N -o readonly=on -f -R /pool -F -T I am not sure if this command also mount the filesystem, its possible you have to do a mound command too. I do also not know what happens if zfs find invalid data structures in this state, so be prepared for a system crash! If you have no luck with the chosen transaction you can also try newer ones - maybe there are some files still not encrypted.

JamesRBrown commented 5 years ago

Thank you for the explanation Stefan, it's very much appreciated. I do think the real trick here will be finding an old enough uber block. When my new drives arrive I'll copy the originals to them, and then I'll proceed with my investigation on the new drives. Fortunately, or perhaps unfortunately, this was a progressive action, so even if I can't find a block completely before any of this mess started, I may be able to recover a substantial part of my data. It's truly unfortunate that there are people in this world who are willing to hurt others for their own personal profit, and I just want you to know how grateful I am for the help you've provide to my efforts to recover, it is deeply appreciated. :)

JamesRBrown commented 5 years ago

Stefan, I very much appreciate your help thus far, and I was wondering if I might impose upon you for a bit more? I'm using FreeNAS 11.2u4.1. I ran:

egrep 'da[0-9]|cd[0-9]' /var/run/dmesg.boot

To get a list of drives. They are listed as ada0 - ada7. I then tried:

zdb -ul /dev/ada0 zdb -ul /dev/ada1 zdb -ul /dev/ada2 ... zdb -ul /dev/ada7

Each outputted:

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3

I then ran:

zpool import

Where I got the following output:

[root@freenas /]# zpool import pool: stronghold id: 14714184237420169316 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://illumos.org/msg/ZFS-8000-EY config:

stronghold                                      ONLINE
  raidz3-0                                      ONLINE
    gptid/d76b91cd-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/d82058cf-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/d8d262e9-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/d987af90-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/da38d716-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/dafd218d-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/dbb2f367-ee21-11e5-93b1-d0509935e862  ONLINE
    gptid/dc78452c-ee21-11e5-93b1-d0509935e862  ONLINE

So it seems the structures are intact. I'm not sure how to go about finding these transaction numbers at this time, and I was wondering if you might have some ideas?

Stefan311 commented 5 years ago

Seems your disks are partitioned. /dev/ada0 gives access to the whole disk. You should use something like /dev/ada0s1 or /dev/ada0p1. Just make a ls /dev/ada* to show the partition access names. You can also try the same access names like zfs does: /dev/gptid/d76b91cd-ee21-11e5-93b1-d0509935e862

I strongly suggest you to import this pool as read only! Even the import is a transaction...!

Stefan311 / ZfsSpy