fcorbelli / zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
MIT License
259 stars 22 forks source link

Memory consumption #104

Closed Namke closed 2 months ago

Namke commented 3 months ago

I tried to compress one folder ~7TB of data with over 100M files. Zpaq crash due to memory consumption I guess, not know exactly how much it consume but its crash in both PC with 32GB of RAM and 96GB. And this is happen while zpaq collecting files, not actual enter the compression yet.

image

Note: both WinRar and Borg (via WSL2) a bit struggling while backup this folder, but eventually work. I really wanted to compare efficiency of both tools, zpaq win in plenty of different cases.

Namke commented 3 months ago

More detai, so zpaq consume above 5gb memory while index ~7M of files, is there are away to limit this? image image

fcorbelli commented 3 months ago

In fact, no No in the sense that (today) zpaqfranz maintains quite a lot of information, in the DT structure, that is often useless More functions = larger data structures = more memory usage I could make a more "frugal" version of this, but I don't know who would care. I'll try to give it some thought

fcorbelli commented 3 months ago

OK, please try the pre release 59.7i (via zpaqfranz update) This will show an (extimation) of used RAM and significantly reduces the memory used (if not strictly necessary). It is not a full optimization for speed reasons (rarely useful => would slow down a lot)

short version: this is slower, but more frugal Please le me know

fcorbelli commented 3 months ago

Uploaded the pre-release debug 59.7k this will (slowly) show the RAM used

Scanned    807.882 00:00:05    161.544 file/s (    1.700.493.185.955)  STATUZ 347.389.260
Scanned    863.596 00:00:06    143.908 file/s (    1.702.488.274.987)  STATUZ 371.346.280
Scanned    893.691 00:00:07    127.651 file/s (    1.703.270.235.072)  STATUZ 384.287.130

After STATUZ

fcorbelli commented 3 months ago

OPPSS I forget -verbose is mandatory

Scanned  1.519.150 00:00:37     41.058 file/s (    1.698.790.034.401) DTRAM 604.621.700

As a rule of thumb about 400 bytes is used for each file (rough extimation) For 100M this will take ~40GB (plus the threads' RAM)

Namke commented 3 months ago

Update from version 59.7k

I'll keep it running and update this case.

image

fcorbelli commented 3 months ago

The slowdown is expected (due to O (n^2) complexity of the debug code) Calculating the memory used is useless information for execution, but it gives an approximate idea of what is happening

fcorbelli commented 3 months ago

As you can see every file takes (about) 430 bytes, as expected Then you can extimate (from the RAM size) how many files you can handle (more or less, of course) For 100M files about 43GB are needed I could make an algorithm that could handle any number of files, with a conspicuous slowdown, but I don't think I will do that any time soon. Perhaps I might need it for the NAS version (Synology/Qnap) where RAM is often low

Namke commented 3 months ago

I'm not digging into code yet, however wild guess that amount of files raise the complexity of finding collision files, correct?

With my user case seem current approach may not viable since scan up to 18M files rate went down to 330 file/s, while I having a folder or 104M files (increasingly)

Technically split this folder into multiple add won't change the situation, correct?

fcorbelli commented 3 months ago

It is just debug code that slow down (very much) I can strip off but you will not know the maximum number of files

I'll release in a couple of hours a "smarter" release

The point was to estimate RAM for each file

Stay tuned!

fcorbelli commented 3 months ago

PS if you update and DO NOT use verbose you should already test yourself

fcorbelli commented 3 months ago

You can upgrade to the pre-release 59.7m In this case you can (maybe) use -debug5 to get a feedback every 1.000.000 files (aka: slowdown, but not by much) on memory usage

Namke commented 3 months ago

I tried with latest release last week. Its work fine with any add batch that contain around 16M files.

My current strategy are added individual folder that contain less than 1tb data and less than 16M files. Added 6 batch with total (archive) of 650GB so far. I'll update when it done all (around 11TB data with over 124M files)

fcorbelli commented 3 months ago

Is it possible to make a version that stores on file, instead of in memory, the list of files It seems to me to be exaggerated, though, definitely exaggerated At the gross 400/500 bytes for each file to be added are required