Closed Namke closed 5 months ago
More detai, so zpaq consume above 5gb memory while index ~7M of files, is there are away to limit this?
In fact, no No in the sense that (today) zpaqfranz maintains quite a lot of information, in the DT structure, that is often useless More functions = larger data structures = more memory usage I could make a more "frugal" version of this, but I don't know who would care. I'll try to give it some thought
OK, please try the pre release 59.7i (via zpaqfranz update) This will show an (extimation) of used RAM and significantly reduces the memory used (if not strictly necessary). It is not a full optimization for speed reasons (rarely useful => would slow down a lot)
short version: this is slower, but more frugal Please le me know
Uploaded the pre-release debug 59.7k this will (slowly) show the RAM used
Scanned 807.882 00:00:05 161.544 file/s ( 1.700.493.185.955) STATUZ 347.389.260
Scanned 863.596 00:00:06 143.908 file/s ( 1.702.488.274.987) STATUZ 371.346.280
Scanned 893.691 00:00:07 127.651 file/s ( 1.703.270.235.072) STATUZ 384.287.130
After STATUZ
OPPSS I forget -verbose is mandatory
Scanned 1.519.150 00:00:37 41.058 file/s ( 1.698.790.034.401) DTRAM 604.621.700
As a rule of thumb about 400 bytes is used for each file (rough extimation) For 100M this will take ~40GB (plus the threads' RAM)
Update from version 59.7k
I'll keep it running and update this case.
The slowdown is expected (due to O (n^2) complexity of the debug code) Calculating the memory used is useless information for execution, but it gives an approximate idea of what is happening
As you can see every file takes (about) 430 bytes, as expected Then you can extimate (from the RAM size) how many files you can handle (more or less, of course) For 100M files about 43GB are needed I could make an algorithm that could handle any number of files, with a conspicuous slowdown, but I don't think I will do that any time soon. Perhaps I might need it for the NAS version (Synology/Qnap) where RAM is often low
I'm not digging into code yet, however wild guess that amount of files raise the complexity of finding collision files, correct?
With my user case seem current approach may not viable since scan up to 18M files rate went down to 330 file/s, while I having a folder or 104M files (increasingly)
Technically split this folder into multiple add won't change the situation, correct?
It is just debug code that slow down (very much) I can strip off but you will not know the maximum number of files
I'll release in a couple of hours a "smarter" release
The point was to estimate RAM for each file
Stay tuned!
PS if you update and DO NOT use verbose you should already test yourself
You can upgrade to the pre-release 59.7m In this case you can (maybe) use -debug5 to get a feedback every 1.000.000 files (aka: slowdown, but not by much) on memory usage
I tried with latest release last week. Its work fine with any add batch that contain around 16M files.
My current strategy are added individual folder that contain less than 1tb data and less than 16M files. Added 6 batch with total (archive) of 650GB so far. I'll update when it done all (around 11TB data with over 124M files)
Is it possible to make a version that stores on file, instead of in memory, the list of files It seems to me to be exaggerated, though, definitely exaggerated At the gross 400/500 bytes for each file to be added are required
I tried to compress one folder ~7TB of data with over 100M files. Zpaq crash due to memory consumption I guess, not know exactly how much it consume but its crash in both PC with 32GB of RAM and 96GB. And this is happen while zpaq collecting files, not actual enter the compression yet.
Note: both WinRar and Borg (via WSL2) a bit struggling while backup this folder, but eventually work. I really wanted to compare efficiency of both tools, zpaq win in plenty of different cases.