Closed sheckandar closed 2 months ago
Please try a -debug3 (getting the last lines of the output)
0/08/2024 18:18:03 38348: -pre fp 15.676.560
10/08/2024 18:18:03 37896: write @ 14.614.528 on fp 15.676.560 (15676560) size 65.536
10/08/2024 18:18:03 38349: -post
10/08/2024 18:18:03 38343: flush k4bis
10/08/2024 18:18:03 38068: !!!!!!!!!!!!!!!!!!!!!! unecrypted on [ENCRYPT @ 288.625.339.468] n
10/08/2024 18:18:03 38343: flush k5
10/08/2024 18:18:03 38095: MMMMMMMMMMMMMMMMMMMMMMMMMMM off 288.610.724.940 writtensofar 14.614.528 onlast 0|
10/08/2024 18:18:03 38343: flush k6
10/08/2024 18:18:03 38343: flush k7
10/08/2024 18:18:03 38343: flush k8
^C20%) 0.66% 01:42:13 ( 141.23 MB)->( 14.06 MB)=>( 2.07 GB) 3.44 MB/s
54760: CONTROL-C detected, try some housekeeping...
10/08/2024 18:18:03 54470: Closing file handle 15.676.560
10/08/2024 18:18:03 55687: Closing file handle first 15.676.560
free(): double free detected in tcache 2
Aborted (core dumped)
I think I understand what the problem is, but I can't think of an easy way to solve it. I will think about it
No, it didn't work. The behavior is exactly the same.
I'm running it on DSM 7.2 now. So I'm not using a statically compiled binary.
Not sure if that makes a difference.
I need to check in a VM The problem is to release every file descriptor to delete all the "broken" chunks (or the garbage is left behind). I'll try with a Linux specific parsing of proc
Please try this pre-release Use -verbose to get more info on closing
Now it doesn't hang and there is no core dump.
12/08/2024 13:38:24 38343: flush k4bis
12/08/2024 13:38:24 38068: !!!!!!!!!!!!!!!!!!!!!! unecrypted on [ENCRYPT @ 157.548.544] 8f
12/08/2024 13:38:24 38343: flush k5
12/08/2024 13:38:24 38095: MMMMMMMMMMMMMMMMMMMMMMMMMMM off 44.564.480 writtensofar 113.508.352 onlast 0|
12/08/2024 13:38:24 38343: flush k6
12/08/2024 13:38:24 38343: flush k7
12/08/2024 13:38:24 20277: expected bytes != written (media full?) 65.536 vs 4.096
12/08/2024 13:38:24 38343: flush k8
12/08/2024 13:38:24 54766: Erased chunk <</voluem1/backups/Information Technology-003.zpaq>>
12/08/2024 13:38:24 55699: Fixed-size chunk: (try) deleting all of them (up to 1)
12/08/2024 13:38:24 61754: Expected file 0 deleted 0
If that output is what was expected then the issue can be closed.
54760: CONTROL-C detected, try some housekeeping...
do close_open_files() ||| 30.000.000.000
Seems Linux, scanning proc
mypid is 13664 path is |/proc/13664/fd|
File to be closed (. and .. skipped) .
File to be closed (. and .. skipped) ..
File to be closed (. and .. skipped) 0
fd 0 fd_path /proc/13664/fd/0
File to be closed (. and .. skipped) 1
fd 1 fd_path /proc/13664/fd/1
File to be closed (. and .. skipped) 2
fd 2 fd_path /proc/13664/fd/2
File to be closed (. and .. skipped) 3
fd 3 fd_path /proc/13664/fd/3
Closing fd (>2) 3
File to be closed (. and .. skipped) 4
fd 4 fd_path /proc/13664/fd/4
Closing fd (>2) 4
File to be closed (. and .. skipped) 5
fd 5 fd_path /proc/13664/fd/5
Closing fd (>2) 5
done close_open_files()
Erased chunk <</voluem1/backups/Information Technology-003.zpaq>>
Fixed-size chunk: (try) deleting all of them (up to 1)
Expected file 0 deleted 0
The method used is not portable, being referable only to Linux. I am rewriting large parts (!) obviously introducing many bugs (!!) to have something (maybe) portable. The problem is quite complex given the multithreaded compression (and thus file creation) mechanism of zpaq. I expect to finish it over the weekend
60_6o.zip This untested pre-release should operate in a portable way
Still exits as expected.
CONTROL-C detected, try some housekeeping...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OK we get a chunk size limit of 30.000.000.000 on 2 files
g_archivefp_first handle 27.542.752
g_archivefp handle 27.542.752
write pool before emergency close (should be 2 max 3)
write pool 00000000 27.542.752 /volume1/backups/Information Technology-001.zpaq
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Closing 00000001 handles
Closing (pool size 00000001) FP 27.542.752
Closing done, seems OK
Fixed-size chunk: (try) deleting all of them (should be 1)
Expected file 1 deleted 1
Goodbye after 10.251 seconds (00:00:10)
15/08/2024 14:36:17 54760: CONTROL-C detected, try some housekeeping...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15/08/2024 14:36:17 61810: OK we get a chunk size limit of 30.000.000.000 on 2 files
15/08/2024 14:36:17 61892: g_archivefp_first handle 24.601.824
15/08/2024 14:36:17 61892: g_archivefp handle 24.601.824
15/08/2024 14:36:17 61808: write pool before emergency close (should be 2 max 3)
15/08/2024 14:36:17 61810: write pool 00000000 24.601.824 /volume1/backups/Information Technology-001.zpaq
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15/08/2024 14:36:17 61814: Closing 00000001 handles
15/08/2024 14:36:17 61817: Closing (pool size 00000001) FP 24.601.824
15/08/2024 14:36:17 39410: ***********=============== [3] 0 EUREKONAAA [v
15/08/2024 14:36:17 39414: EXIT da eliminazione fp 24.601.824
15/08/2024 14:36:17 06706: pre closing fp 24.601.824
15/08/2024 14:36:17 06710: post closing fp 24.601.824 result 0
15/08/2024 14:36:17 36253: fp now 0
15/08/2024 14:36:17 61824: Now pool size is 00000000
15/08/2024 14:36:17 61822: Closing done, seems OK
15/08/2024 14:36:17 55699: Fixed-size chunk: (try) deleting all of them (should be 1)
15/08/2024 14:36:17 55708: Erased chunk 00000000 <<15/08/2024 14:36:17 61916: file not found to be killed /volume1/backups/Information Technology-001.zpaq
15/08/2024 14:36:17 61754: Expected file 1 deleted 1
Goodbye after 19.589 seconds (00:00:19)
The method used is not portable, being referable only to Linux. I am rewriting large parts (!) obviously introducing many bugs (!!) to have something (maybe) portable. The problem is quite complex given the multithreaded compression (and thus file creation) mechanism of zpaq. I expect to finish it over the weekend
This is just my 2 cents.
Is there the same problem on other OS as well ? I thought this problem only applied to Linux.
Why re-write a lot of code just for 1 OS ? Just create an IF block for Linux. If there is potential for a lot of bugs on all OS, it would be more efficient to just add the code for that one OS.
The method used for Linux cannot be duplicated on other systems. It enumerates open files by reading them directly from the operating system, which is not feasible, for example, on Windows systems (high privileges are needed) Previous version olny gets the first and the last FP (file handles)
The current version maintains a list of files open for writing, gradually deleting (from the vector) those that are closed (and makes debug much easier). It then closes them (the handles). And finally it deletes the files left behind from the filesystem
But there is a but.
Since compression is performed by multiple threads, it may happen that a file is opened for writing AFTER control-C is pressed. This “screws” the system of maintaining an orderly list of open files. Conventional techniques, i.e., controlling a global state variable within the compression thread cycle, comes at no small cost (i.e., slows down) In fact there are N compression threads (typically one per core) and ONE write thread. So placing the overhead inside the N threads would have had a considerable impact
So I studied a very dirty technique of getting an unopened file to write back from the function that normally does that. Basically if control-C is pressed myfopen() it returns a bad handle. At that point the (write) thread goes into error (a C++ exception) and finally exits (!)
In short, as today (it should be tested under all circumstances) control-C during compression should be well managed. There is no handling at all during extraction, because the number of concurrent open files in write can be large (the performance issue comes back). Maybe in the future I will think of some escamotage.
The question remains whether or not to truncate individual archive files to make a rollback (aka: control C on a single multi-versioned zpaq)
This is a very complicated issue and, for now, I have left it behind (i.e., the code is there, but it is commented out)
OK. Makes sense.
The latest version is working fine. No errors so far.
During the file copy stage if Ctrl+C is pressed, zpaqfranz prints an error and on some occasions remains in memory and has to be killed with
killall -9 zpaqfranz
command../zpaqfranz a "/volume1/backups/Information Technology-???" "/volume1/snapshots/Information Technology" -not "*~$*" -not "*._DAV*" -not "*._DS_Store" -nosynology -chunk 30G
Error:
free(): double free detected in tcache 2