fcorbelli / zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
MIT License
275 stars 25 forks source link

PAKKA GUI creates archives with unchosen flags #97

Closed Werve closed 3 months ago

Werve commented 5 months ago

I tried the last 2 versions of PAKKA GUI on the site (by clicking "Browse PAKKA builds," in settings) and trying to create an archive with, for example, a file and the following options: force longpath store windate store file hash, default method 1 VSS Use ADS force zfs force Windows Removing the checkmark to make a backup

An error is shown indicating that only one hash file method should be chosen. Trying to remove some of the options listed above the file is created but in backup mode (a choice I had disabled from the interface).

Also I noticed that it uses the built-in version of zpaqfranz, wouldn't it be better to be able to select a path to use a chosen version (or in the same folder) so as to avoid cases where the development of the two projects are not close?

fcorbelli commented 5 months ago

Current PAKKA need a lot of work 😄

Also I noticed that it uses the built-in version of zpaqfranz, wouldn't it be better to be able to select a path to use a chosen version (or in the same folder) so as to avoid cases where the development of the two projects are not close?

In fact no No because newer zpaqfranz.exe (64 bit) does have an autoupdate Then, sooner or later, even PAKKA will autoupdate zpaqfranz, if needed

I had virtually zero feedback for 10 years on PAKKA, now it seems some users are there I will see if I can improve a little Thanks for the reports

Werve commented 5 months ago

I was trying PAKKA for convenience, to store backups of all files on a disk but due to the problems described I finally preferred to proceed via console. I know that backups are not a primarily suitable use but I have not found any other program that can store deduplicated backups with versions as if they were differential and you can quickly remove a previous version to regain space (from what I understand you can use the -repack command that copies already compressed blocks so effectively you can quickly remove a single version for example).

For now, however, I have noticed a problem for this use and a concern. The problem is the lack of symbolic file archiving and the question is whether using archiving via VSS chooses all files (I knew that by default VSS excludes Outlook files https://learn.microsoft.com/en-us/windows/win32/vss/excluding-files-from-shadow-copies?redirectedfrom=MSDN)

Edit: Also, I noticed that the -windate option doesn't store creation dates, so it could be another issue in that use case

fcorbelli commented 5 months ago

I was trying PAKKA for convenience, to store backups of all files on a disk but due to the problems described I finally preferred to proceed via console.

PAKKA is, in fact, an EXTRACTER. It is born to make mysql dump's restoring easier I do not work very much on creating an archive. The idea is to have a GUI interface for versioned backups, something that is not normally there for other programs. With PAKKA you can add data to a file as you go with a minimal amount of clicks (at least, if and when I finish it)

I know that backups are not a primarily suitable use but I have not found any other program that can store deduplicated backups with versions as if they were differential and you can quickly remove a previous version to regain space (from what I understand you can use the -repack command that copies already compressed blocks so effectively you can quickly remove a single version for example).

In the latest versions it is possible to delete added versions, clearly this causes all data added after the time of "cutting" to be lost. It is not possible to quickly resume space in a deduplicated archive (without heavy processing) You can remove a single, or many, versions However, only the LAST ones

For now, however, I have noticed a problem for this use and a concern. The problem is the lack of symbolic file archiving and the question is whether using archiving via VSS chooses all files (I knew that by default VSS excludes Outlook files https://learn.microsoft.com/en-us/windows/win32/vss/excluding-files-from-shadow-copies?redirectedfrom=MSDN)

Symlinks are not supported at all, I started to implement a tar-like mechanism, but Windows is so messy and undocumented that I let it go

Edit: Also, I noticed that the -windate option doesn't store creation dates, so it could be another issue in that use case

Actually it should. However, it is an option that I personally never use, so I do not check its functionality If you have any repeatable examples please propose them, and I will correct any errors

fcorbelli commented 5 months ago

I tried the last 2 versions of PAKKA GUI on the site (by clicking "Browse PAKKA builds," in settings) and trying to create an archive with, for example, a file and the following options: force longpath store windate store file hash, default method 1 VSS Use ADS force zfs force Windows Removing the checkmark to make a backup

An error is shown indicating that only one hash file method should be chosen. Trying to remove some of the options listed above the file is created but in backup mode (a choice I had disabled from the interface). Can you please explain better what do you want to do? Thanks

Werve commented 5 months ago

In the latest versions it is possible to delete added versions, clearly this causes all data added after the time of "cutting" to be lost. It is not possible to quickly resume space in a deduplicated archive (without heavy processing) You can remove a single, or many, versions However, only the LAST ones

So assuming an input.zpaq archive with 3 versions to which I want to remove version 2 I cannot proceed by doing: zpaqfranz x input.zpaq -repack output.zpaq -until 1 zpaqfranz x input.zpaq -repack output.zpaq -until 3

Therefore copying only the blocks referenced by versions 1 and 3 into "output.zpaq" ?

Actually it should. However, it is an option that I personally never use, so I do not check its functionality If you have any repeatable examples please propose them, and I will correct any errors

You can try to archive the following folder (after zip extract) Creation Date 1-1-2016.zip

with the following command: zpaqfranz -method 1 -verbose -utf -windate -force -forcezfs -forcewindows -filelist -longpath a test.zpaq ".\Creation Date 1-1-2016"

Both the file and the folder should have: creation date: 1/1/2016 1:1:1 modified date: 1/1/2020 1:2:3

But by extracting the archive (e.g. with PAKKA) only the modified dates are preserved.

fcorbelli commented 5 months ago

In the latest versions it is possible to delete added versions, clearly this causes all data added after the time of "cutting" to be lost. It is not possible to quickly resume space in a deduplicated archive (without heavy processing) You can remove a single, or many, versions However, only the LAST ones

So assuming an input.zpaq archive with 3 versions to which I want to remove version 2 I cannot (...)

If you want to QUICKLY (aka: almost in no time) drop versions you can (from 59.4h) use the new crop command

C:\zpaqfranz>zpaqfranz crop vers.zpaq
zpaqfranz v59.4h-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-08)
franz:-hw

vers.zpaq:
4 versions, 72 files, 2.343.688 bytes (2.23 MB)
---------------------------------------------------------------------------
<  Ver  > <  date  > < time > <    version size    > < Offset (w/out encr)>
V00000001 2024-05-05 19:21:52 [               2.551] @                2.551
V00000002 2024-05-05 19:22:05 [               2.604] @                5.155
V00000003 2024-05-05 19:22:17 [           2.329.993] @            2.335.148
V00000004 2024-05-05 19:22:30 [               8.540] @            2.343.688
no -kill, this is just a dry run

You can crop the archive, for example, at version 2. This will discard versions 3 and 4, and the file will become (in the example) 5.155 bytes long

You CANNOT delete version 3, keeping 1, 2 and 4

The crop command is used to delete versions added by mistake (sometimes it happens) or to delete old copies that are useless Translation Typical scenario: first version of a fileserver is 100GB large Later versions are each 1GB large, and let's say there are 50 The archive is now 100+50 1 = 150GB For some reason I have no interest in keeping the last 20 versions. I crop/drop them, and the archive will be 100+30 1 = 130GB. Then I launch an update, and it will go back to being aligned with the current data

Maybe this will become "drop" instead of "crop". More accurate

fcorbelli commented 5 months ago

On -windate:

zpaqfranz l z:\thearchive.zpaq -windate

will show creation dates (if ever)

Checking for bugs in progress...

Werve commented 5 months ago

I tried the last 2 versions of PAKKA GUI on the site (by clicking "Browse PAKKA builds," in settings) and trying to create an archive with, for example, a file and the following options: force longpath store windate store file hash, default method 1 VSS Use ADS force zfs force Windows Removing the checkmark to make a backup An error is shown indicating that only one hash file method should be chosen. Trying to remove some of the options listed above the file is created but in backup mode (a choice I had disabled from the interface). Can you please explain better what do you want to do? Thanks

I was indicating that PAKKA does not seem to properly consider the options chosen by GUI for compression, in many cases despite not having chosen it or having just removed the checkmark on multipart backup mode, I have noticed that it still creates the archive that way.

I then proceeded via command line. I would like to archive all the files contained in a drive, say Z:\ of size about 1TB, as if it were a full backup. Later re-do the same operation with the same file.zpaq as if they were differential backups (thanks to deduplication). Sooner or later, though, too much space would be consumed, so I would like to remove versions to free up space.

Basically this is what you do with a backup system but with deduplication, which unfortunately I have not found straightforward programs for such work on Windows.

fcorbelli commented 5 months ago
zpaqfranz a test.zpaq ".\Creation Date 1-1-2016" -windate -force -longpath 

Then

Z:\>zpaqfranz l test.zpaq -windate
zpaqfranz v59.4h-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-08)
franz:-windate -hw

test.zpaq:
1 versions, 1 files, 1.258 bytes (1.23 KB)

- 2020-01-01 01:02:03 (C) 2016-01-01 01:01:01                  94 A     Z:/Creation Date 1-1-2016/Test datestamps.txt

                   94 (94.00  B) of 94 (94.00  B) in 1 files shown
                1.258 compressed  Ratio 13.383 <<test.zpaq>>
0.016 seconds (000:00:00) (all OK)

And then

Z:\>zpaqfranz x test.zpaq -windate -longpath -to z:\restored
zpaqfranz v59.4h-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-08)
franz:-to                   <<z:/restored>>
franz:-windate -hw -longpath
INFO: setting Windows' long filenames

test.zpaq:
1 versions, 1 files, 1.258 bytes (1.23 KB)
Extract 94 bytes (94.00  B) in 1 files (0 folders) / 32 T

Files to be worked 1  => founded 1 => OK 1
0.032 seconds (000:00:00) (all OK)

If I understand you want FOLDER date creation too, not only FILE date creation Is it right?

fcorbelli commented 5 months ago

(...) Just a bug (one of many, or better "an options"

I then proceeded via command line. I would like to archive all the files contained in a drive, say Z:\ of size about 1TB, as if it were a full backup. Later re-do the same operation with the same file.zpaq as if they were differential backups (thanks to deduplication). Sooner or later, though, too much space would be consumed, so I would like to remove versions to free up space.

Basically this is what you do with a backup system but with deduplication, which unfortunately I have not found straightforward programs for such work on Windows.

Simply, you can't The data will stay "forever" inside the archive One method is to use the -freeze switch, that is, archiving backups that become too large to start over again

Werve commented 5 months ago

On -windate:

zpaqfranz l z:\thearchive.zpaq -windate

will show creation dates (if ever)

Checking for bugs in progress...

In the example described through that command the creation dates are also reported but trying to extract, even adding -windate the created files have only the correct modification date.

zpaqfranz a test.zpaq ".\Creation Date 1-1-2016" -windate -force -longpath 

Then

Z:\>zpaqfranz l test.zpaq -windate
zpaqfranz v59.4h-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-08)
franz:-windate -hw

test.zpaq:
1 versions, 1 files, 1.258 bytes (1.23 KB)

- 2020-01-01 01:02:03 (C) 2016-01-01 01:01:01                  94 A     Z:/Creation Date 1-1-2016/Test datestamps.txt

                   94 (94.00  B) of 94 (94.00  B) in 1 files shown
                1.258 compressed  Ratio 13.383 <<test.zpaq>>
0.016 seconds (000:00:00) (all OK)

And then

Z:\>zpaqfranz x test.zpaq -windate -longpath -to z:\restored
zpaqfranz v59.4h-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-08)
franz:-to                   <<z:/restored>>
franz:-windate -hw -longpath
INFO: setting Windows' long filenames

test.zpaq:
1 versions, 1 files, 1.258 bytes (1.23 KB)
Extract 94 bytes (94.00  B) in 1 files (0 folders) / 32 T

Files to be worked 1  => founded 1 => OK 1
0.032 seconds (000:00:00) (all OK)

If I understand you want FOLDER date creation too, not only FILE date creation Is it right?

Yes, I would like to see that preserved as well. But also the creation date of the extracted file doesn't match the initial date. Edit: No, I checked again and the extracted file seems to have the correct dates. Probably both PAKKA and in the first attempt -windate was not added in the extraction as well.

fcorbelli commented 5 months ago

This seems OK to me ok

Folder's timestamp require a bit of heuristics

Werve commented 5 months ago

This seems OK to me ok

Folder's timestamp require a bit of heuristics

Yes, sorry I edited as soon as I noticed. Probably neither PAKKA nor the first extraction attempt added -windate also to extract.

Werve commented 5 months ago

(...) Just a bug (one of many, or better "an options"

I then proceeded via command line. I would like to archive all the files contained in a drive, say Z:\ of size about 1TB, as if it were a full backup. Later re-do the same operation with the same file.zpaq as if they were differential backups (thanks to deduplication). Sooner or later, though, too much space would be consumed, so I would like to remove versions to free up space. Basically this is what you do with a backup system but with deduplication, which unfortunately I have not found straightforward programs for such work on Windows.

Simply, you can't The data will stay "forever" inside the archive One method is to use the -freeze switch, that is, archiving backups that become too large to start over again

Would it be possible to use the new drop/crop command to remove early versions instead of late versions? In the described use case of backup chains, this might be enough.

fcorbelli commented 5 months ago

Would it be possible to use the new drop/crop command to remove early versions instead of late versions? In the described use case of backup chains, this might be enough.

No, it is not You have to repack But, in reality, it is never done I, at least, don't do it When an archive gets too big, I move it to external media (like USB HD) and at the next run it will be recreated automatically

I checked the issue of the date of creation of the folders. It is simply not an information stored in the standard zpaq format. zpaqfranz uses an extended block of data, in which it stores the hashes of the files and with --windate--just the date. But the folders are devoid of information about the hashes, and therefore also of the date of creation I could actually create a handle on this date as well, moving it out of the hash block It is a doable job, however, it is not trivial For now, I'd say let's just gloss over it, assuming there are no other requests from other users

fcorbelli commented 5 months ago

OK, please try the attached pre-release with -windate This will restore on folders too

59_4i.zip

IF folders does exists

Translation This will include the folder

zpaqfranz a test "Creation Date 1-1-2016"

This does not

zpaqfranz a test "Creation Date 1-1-2016\*"

In first case TWO objects will be stored

2020-01-01 01:02:03 (C) 2016-01-01 01:01:01 0 D Creation Date 1-1-2016/ 2020-01-01 01:02:03 (C) 2016-01-01 01:01:01 94 A Creation Date 1-1-2016/Test datestamps.txt

In the second just the file

- 2020-01-01 01:02:03 94 A Creation Date 1-1-2016/Test datestamps.txt

I will not write a heuristic to automagically solve this situation, which is too complex and lacks real utility Short version: add folders and not files to the archive, if you want folders to be "touched"

Werve commented 5 months ago

OK, please try the attached pre-release with -windate This will restore on folders too

59_4i.zip

IF folders does exists

Translation This will include the folder

zpaqfranz a test "Creation Date 1-1-2016"

This does not

zpaqfranz a test "Creation Date 1-1-2016\*"

In first case TWO objects will be stored

2020-01-01 01:02:03 (C) 2016-01-01 01:01:01 0 D Creation Date 1-1-2016/ 2020-01-01 01:02:03 (C) 2016-01-01 01:01:01 94 A Creation Date 1-1-2016/Test datestamps.txt

In the second just the file

- 2020-01-01 01:02:03 94 A Creation Date 1-1-2016/Test datestamps.txt

I will not write a heuristic to automagically solve this situation, which is too complex and lacks real utility Short version: add folders and not files to the archive, if you want folders to be "touched"

I confirm that it works even with the creation date of the folder IF I don't add -longpath

Werve commented 3 months ago

(...) Just a bug (one of many, or better "an options"

I then proceeded via command line. I would like to archive all the files contained in a drive, say Z:\ of size about 1TB, as if it were a full backup. Later re-do the same operation with the same file.zpaq as if they were differential backups (thanks to deduplication). Sooner or later, though, too much space would be consumed, so I would like to remove versions to free up space. Basically this is what you do with a backup system but with deduplication, which unfortunately I have not found straightforward programs for such work on Windows.

Simply, you can't The data will stay "forever" inside the archive One method is to use the -freeze switch, that is, archiving backups that become too large to start over again

In case anyone are looking for a solution for the use case I described, I ended up having to use the NTFS data deduplication feature (which is usually only present on Windows server editions but there are workarounds for other editions) into a VHDX file container to support mount as well. So it can preserve symlinks, hardlinks, junction points, datestamps, and the flexibility to add and remove files like any folder.

Hopefully, someday the new ReFS filesystem will be fully supported by client versions of Windows and you can just use that, due to the recent additions of deduplication and post-compression.

But of course the best for cross-OS compatibility is an open source archive and tool like zpaqfranz :)

fcorbelli commented 3 months ago

In this case I use truecrypt/veracrypt virtual disks, with zpaqfranz (without compression, aka -m0) for backup In the case of server I make a sector-level image (-image), but it requires zero (f) free sectors

Werve commented 3 months ago

I considered using veracrypt but unfortunately it does not have a storage system that allows you to reclaim unused space if you later delete files in the container. Whereas by creating a dynamically sized VHDX you can either manually and precisely automatically expand it to the chosen size or reclaim the space no longer actually used in the container via Optimize-VHD function (better to defrag the free space first so that it is contiguous). For encrypting VHDX transparently I can use Bitlocker (it keep the ease of mount).

fcorbelli commented 3 months ago

This will f*up the deduplication, due to moved blocks I use veracrypt+zpaqfranz from years It just works

Werve commented 3 months ago

Are you referring to the Optimize-VHD command ? I ran several days of testing with NTFS deduplication (appropriate commands) + Bitlocker + Defrag all inside a VHDX container. and so far it worked, files remained readable and hash matched.

It all started after I read this article: https://www.deploymentresearch.com/beyond-zip-how-to-store-183-gb-of-vms-in-a-19-gb-file-using-powershell/

fcorbelli commented 3 months ago

I work with .vmdk and zpaq from 10 years 😄 No need for such complexity zpaq with -key That's all

On zfs systems it is even better

Werve commented 3 months ago

But, based on the previous conversation, you can't reclaim space from a zpaq archive by removing files from previous versions (except the latest one), right? I ended to use this system because it is the only one I have found that also allows you to delete any files effectively reducing the space used and with deduplication features (which is well integrated for Windows, unlike ZFS).

fcorbelli commented 3 months ago

It is possible to purge zpaq (with a bit of effort)... but... why? It is zpaq's best feature for disaster recovery. Space occupancy is typically minimal, becoming significant after months or hundreds of versions Deleting data from a backup is THE NO-NO

fcorbelli commented 3 months ago

PS Windows' deduplication is crap ZFS just about as crap, but better than Windows The main use is quite counterintuitive, i.e., it serves to minimize WRITING during operation checks of backups, i.e., extraction of the entire contents of an archive

Werve commented 3 months ago

It is possible to purge zpaq (with a bit of effort)... but... why? It is zpaq's best feature for disaster recovery. Space occupancy is typically minimal, becoming significant after months or hundreds of versions Deleting data from a backup is THE NO-NO

In my case, I wanted to reduce the size of some full HDD backups, accumulated for years. Since there will surely be several files and equal portions left over the years through deduplication I could save space. But I intend to continue with full backups still for years in the future and already currently I had almost run out of space in the HDDs in which I store them. So sooner or later I would have to remove the older backups. Or of course you can always buy more space but I did not want to proceed in that direction since it concerns a personal and not a work situation anyway.

fcorbelli commented 3 months ago

Deleting old data is not a practice I recommend; I have all the individual files since 1993. Deduplication doesn't make much sense as a space-saving methodology, it makes a lot more sense for different versions, i.e. snapshots Incidentally you can quickly estimate the amount of duplicate files (files, not parts of files) with the sum command and the -quick switch (ad hoc)

Werve commented 3 months ago

I tried compression before but a lot of time was wasted for almost no space saved since almost all files are not-compressible (like video). Since it is very likely that in the various backup chains there were repeated all those files several times then the biggest space savings I achieved is through deduplication. Many times, however, it happens, again to cite the example of videos, that there are also many clips clipped from a longer video. So they are still no-compressible (because of the work of the video encoders) but fully contained in the longer video. Another case where with deduplication ( partial data) gets a savings in used space.

Unfortunately, I found no other way. And even with deduplication it is very likely that in a couple of years at this rate of backups I will run out of space and will have to remove older backups.

fcorbelli commented 3 months ago

Video cannot be deduplicated very well You'll have to buy more hardware, or delete something

Werve commented 3 months ago

Video cannot be deduplicated very well You'll have to buy more hardware, or delete something

Yes, that's what I'm saying. In my use case unfortunately I have no alternatives. And I don't plan on buying more space anymore so I will have to remove older files in the future. For this reason I have already worked to find a storage system that will allow me to do this when I'll have like 1 MB left free.