Closed LukaszBrzyszkiewicz closed 2 months ago
There is not much that can be done: the zpaq format, for backward compatibility reasons, does not support the possibility of having “holes” (i.e., corrupted archive parts). You can even replace piece 0002.zpaq with piece 0004.zpaq from another archive (!) This is NOT true for chunked zpaqfranz's archive (aka: limit on chunk size)
In zpaqfranz to mitigate (not solve, mitigate) the problem I added the backup command Which works in the same way as part-based archiving BUT maintains an index file that allows you to verify (with the testbackup command), quickly or thoroughly, that all the “pieces” are right
Three times compression process was killed because of too long execution time.
If you want to kill you should try Control-C. This will be intercepted and (hopefully!) some housekeeping done
Of course, it is not possible to prevent “brutal” termination from resulting in data corruption. In the case of a single (i.e., non-multiparty) archive, resilience is assured: at the next execution the hung transaction will be discarded, and the updated archive It is not possible to give 100% certainty, but in general it works well There is also the trim command (specific to zpaqfranz) to discard any portions left “hanging” from an archive
TRANSLATION: z:\ugo\apezzi is good, apezzi is NOT good (it is a feature 😄 )
Default hash is MD5, I suggest using -backupxxh3 if you do not need a "manual" MD5 check (aka: heztner storageboxes)
zpaqfranz backup z:\ugo\apezzi c:\zpaqfranz -backupxxh3
zpaqfranz backup z:\ugo\apezzi c:\nz -backupxxh3
zpaqfranz backup z:\ugo\apezzi c:\1200 -backupxxh3
In this example you'll get
Z:\ugo>dir .
Il volume nell'unità Z è RamDisk
Numero di serie del volume: 8ABB-DDB8
Directory di Z:\ugo
30/04/2024 16:18 <DIR> .
30/04/2024 16:18 <DIR> ..
30/04/2024 16:18 2.745.144 apezzi_00000000_backup.index
30/04/2024 16:18 398 apezzi_00000000_backup.txt
30/04/2024 16:17 743.804.106 apezzi_00000001.zpaq
30/04/2024 16:18 832.404.177 apezzi_00000002.zpaq
30/04/2024 16:18 1.429.415.084 apezzi_00000003.zpaq
5 File 3.008.368.909 byte
Now test quick (not very realiable)
zpaqfranz testbackup z:\ugo\apezzi
Corruption test (-ssd for solid state media, on HDD do NOT use!)
zpaqfranz testbackup z:\ugo\apezzi -verify -ssd
Double check
zpaqfranz testbackup z:\ugo\apezzi -verify -ssd -paranoid
OK, now we corrupt the archive
Z:\ugo>copy z:\ugo\apezzi_00000000_backup.txt z:\ugo\apezzi_00000002.zpaq
Sovrascrivere z:\ugo\apezzi_00000002.zpaq? (Sì/No/Tutti): s
1 file copiati.
The piece 2 is now KO
Z:\ugo>zpaqfranz testbackup z:\ugo\apezzi
zpaqfranz v59.4c-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-04-29)
franz:testbackup _ - command
franz:-hw
====================================================================================================
part0 z:/ugo/apezzi_00000000.zpaq i_filename z:/ugo/apezzi_????????.zpaq
Multipart backup looks good
Loading backupfile... z:/ugo/apezzi_00000000_backup.txt
Rows in backup 00000003 from 00000001 to 00000003
Enabling XXH3 (in reading) hasher
Initial check part <<z:/ugo/apezzi_00000002.zpaq>>
Filesize does not match real 398 vs expected 832.404.177
0.047 seconds (000:00:00) (with errors)
Thank your for your answer.
Can you help me with creating proper zpaqfranz arguments sets?
I'm currently using such approach, but it looks like this is error prone and not a good idea for regular backup (real path names are different):
zpaqfranz a "/backup/name_????.zpaq" "/source/" -m5 -copy "/secondbackup/" -xxh3 -verbose -not "*.log" -find "__vacuum__" -replace "" -filelist -test
I'm also using second approach for metadata backup which contains many MB and poor compressible data:
zpaqfranz a "/backup/meta_????.zpaq" "/metasrc/" -m0 -index "/backup/meta_0000.zpaq" -copy "/secondbackup/" -xxh3 -verbose -not "*.log" -find "__vacuum__" -replace "" -filelist -test
---remove local file, but not meta_0000.zpaq
In general most important thing is:
To minimize problems I have also plan to (can you help with building the commands ?):
zpaqfranz a "/backup/name_????.zpaq" "/source/" -m5 -copy "/secondbackup/" -xxh3 -verbose -not "*.log" -find "vacuum" -replace "" -filelist -test
-m5 is placebo-level compression, and will try to compress even uncompressable data (until -m4 uncompressabile data is stored). -filelist is not useful, in your case, because it is a non-ADS (NTFS) filesystem -copy is usually for USB drives -xxh3 is to quickly made a verify (the default XXHASH is more then enough)
Then my suggest is just
zpaqfranz a "/backup/name_????.zpaq" "/source/" -not "*.log" -find "__vacuum__" -replace ""
(more on testing in next posts)
I'm also using second approach for metadata backup which contains many MB and poor compressible data: zpaqfranz a "/backup/meta_????.zpaq" "/metasrc/" -m0 -index "/backup/meta_0000.zpaq" -copy "/secondbackup/" -xxh3 -verbose -not "*.log" -find "vacuum" -replace "" -filelist -test ---remove local file, but not meta_0000.zpaq
You really do not need to use -m0, unless you REALLY take encrypted file or higly compressed (.MP4 etc) -m1 will compress (if possible) and not compress at all (if file does not seems... compressable)
For example, making the backup of a TrueCrypt volume a -m0 is appropriate
In general most important thing is: I need to be sure that previously created file isn't corrupted - maybe I should ALWAYS execute trim for last created archive? Or I should switch to backup command?
It depends on whether you want to use multivolume or monolithic archive
I suggest backup. It work just like regular multivolume BUT with a textfile with hashes This make much more faster to check against corruption (AND MISSING PIECES)
AND the testbackup command
I suggest the t (test) command after an add, plus (if you can) -paranoid or the w command (if you have enough ram)
test to extract last snapshot once in a week zpaqfranz t thearchive.zpaq
once in a month merge all daily snapshots (is this achievable??) and maybe start from the beginning or use this as a startpoint for data deduplication? It is doable with the m (merge) command. But it is just pointless
This can be a good example, with a rsync-based remote-cloud backup (aka hetzner storagebox)
Just a snipped, adjust as you like
if [ -d "/monta/nexes_sei_aserver6/rar" ]
then
/bin/date +"%R----------NAS: Directory rar apezzi esiste "
/usr/local/bin/zpaqfranz backup /monta/nexes_sei_aserver6/apezzi/rambo.zpaq /tank -zfs -key pippo -space
/usr/local/bin/zpaqfranz testbackup /monta/nexes_sei_aserver6/apezzi/rambo.zpaq -paranoid -ssd -key pippo -big >/tmp/remoto.txt
/usr/local/bin/rsync -I --exclude "/*.zfs" --append --omit-dir-times --no-owner --no-perms --partial --progress -e "/usr/bin/ssh -p 23 -i /root/script/storagebox_openssh " -rlt "/monta/nexes_sei_aserver6/apezzi/" "storageuser@somehwre.your-storagebox.de:/home/rambo/apezzi/"
ssh -p23 -i /root/script/storagebox_openssh storageuser@somwehere.your-storagebox.de df -h >>/tmp/remoto.txt
ssh -p23 -i /root/script/storagebox_openssh storageuser@somwhere.your-storagebox.de ls -l /home/rambo/apezzi/ >>/tmp/remoto.txt
PARTNAME=`/usr/local/bin/zpaqfranz last "/monta/nexes_sei_aserver6/apezzi/rambo_????????"`
echo $PARTNAME
ssh -p23 -i /root/script/storagebox_openssh storageuser@somewhere.your-storagebox.de "md5sum /home/rambo/apezzi/$PARTNAME" >>/tmp/remoto.txt
/usr/local/bin/zpaqfranz sum /monta/nexes_sei_aserver6/apezzi/$PARTNAME -md5 -pakka -noeta -stdout >>/tmp/remoto.txt
/usr/local/bin/zpaqfranz last2 /tmp/remoto.txt -big >>/tmp/remoto.txt
...somehow SMTP the /tmp/remoto.txt file to yourself...
else
/bin/date +"%R----------NAS: Directory rar apezzi NON esiste "
fi
The idea is
I want also to add maybe some additional data corruption repair archiver (maybe parchive?)
There is none in zpaq (more on later)
The "right" way to do the tests depends on whether they are LOCAL or REMOTE. Local are files that you keep on a NAS, secondary hard drive etc. REMOTE are those that you transfer, for example with rsync, to a distant machine
For LOCALS. 1) check that the archive is not corrupted (e.g., because the process was killed in the middle of work). You get this with the command t (test) 2) If you have enough free space then the -paranoid switch (which, however, postulates having write-expendable disks, e.g. RAMDISK or cheap ssd) 3) If you have a lot of RAM the command w 4) In the case of multipart don't use multipart... but the backup command (which is a multipart with index) and testbackup
For REMOTE I put an example above. For WINDOWS machines (or rather NTFS filesystems, or NTFS-like) there are switches -ads to store the CRC-32 of archives (not files, just archives)
Thank you very much for your comprehensive answer :)
I will adapt and use your suggestions.
Btw - is it possible to tweak progress ? I'm thinking on two things - first is that progress is stuck almost always on some percents, second thing - I'm parsing output from stdout and converting to cronicle-edge JSON format - but maybe there is a possible way to add such output format (like -pakka)
Btw - is it possible to tweak progress ? I'm thinking on two things - first is that progress is stuck almost always on some percents,
In fact no easily, it is already carefully "tweaked"
Does not change until a different ETA is computed
This make much faster update during low-compressions - lot of files to be archived
AKA: works well with -m1, -m2, -m3. Not very well with -m4. Not good with -m5
A tradeoff is needed to minimize the output (it takes a long time, slowing down a lot) both for computation but still leave it responsive, whether for small files or giant files
second thing - I'm parsing output from stdout and converting to cronicle-edge JSON format - but maybe there is a possible way to add such output format (like -pakka)
There is the fzf command, not really sure if it is enough
If you want some kind of json output, please give me an example
I have a daily zpaq file creation. Three times compression process was killed because of too long execution time.
For me this was fine, because I accidentally put big file into backuped folder. Problem occurs that after this point new created archives are somehow invalid and can't currently extract - in theory - created archives without any errors.
Whole archive contains files from 0001.zpaq to 0044.zpaq. One for each day. When I execute zpaqfranz i "brainapp????.zpaq"_ results shows only version from 1 to 26 (27, 28 and 29 version was interrupted by kill command). When I'm trying to extract particular file from 0040.zpaq file I have error: "2 bad frag IDs, skipping..." and after few minutes, zpaq exit and nothing is extracted.
I was trying to trim those three files - in the result "info" command show list up to 44, but still not possible to extract any file.
Any idea what to do next and maybe zpaq should be improved to not fail in such situation?