Open ari opened 6 years ago
@ari i discover this issue this morning. Files got a wrong uid (not the same between each node). This might help you : http://sys.ldvi.fr/csync2.html It's for v1.34 but you can easily edit daemon.c and update.c without patching, to have your csync2 works with user/group strings. So far it's "working" for me. Be careful I didn't test all case ! Hope it will help.
Thanks
Thanks for this. In my case I'm pretty sure all the user/group ids and string names match each other across all machines. So I don't think it is the same problem. At any rate, its completely random and happens maybe only one in every 10,000 file operations.
@ari , I have the same problem, did you find any solution for it?
No, I moved to other solutions with plain rsync on multiple masters which proved simpler for my use-case.
just add an exec in action{} like chown xxx
This should have become better with c856e930a05a50d491325fa119b035dd1a5e8f37
but it should actually be implemented as "send all the context information first, then (try to) apply the changes in one go" by preparing the new file with proper content and ownership and permissions and acls and mtimes and so on first, before materializing it into place.
Any volunteers?
Feedback: compiled/installed 83b3644 yesterday but found some synchronized files owned as root (= should be www-data) this morning :(. Could one explain the race condition for this behaviour?
Csync2 does not yet have a protocol command that communicates content and permissions and ownership and mtime and whatnot in one "transaction".
Content is transferred first. Content is put in a temp file, then renamed into place. Next commands transfer ownership and so on.
The mentioned commit tries to apply permissions and ownership of an already existing target file to the temp file before renaming into place. Should work nicely for diff transfers, content change of existing files.
Obviously does not work for new files. New files start their existence as root owned on the target node, because at that point csync2 does not know better yet. Though typically milliseconds later the next commands will transfer ownership and permissions.
That is arguably one of a number of design bugs in csync2.
Mitigation for new files: wait for the current csync2 run to finish before looking at these files.
Suggested fix: change csync2 protocol to transfer all relevant information first, before "instantiating" the resulting aggregate change on the receiving target node.
Needs to be implemented. Preferably not as a breaking change.
alexanderheckel notifications@github.com schrieb am Sa., 26. Sep. 2020, 08:42:
Feedback: compiled/installed 83b3644 https://github.com/LINBIT/csync2/commit/83b36449abb4c2903ef3b40b46018240633989e0 yesterday but found some synchronized files owned as root (= should be www-data) this morning :(. Could one explain the race condition for this behaviour?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LINBIT/csync2/issues/5#issuecomment-699440921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGANXVKIZ4T7ITXZCUQEHDSHWEOVANCNFSM4E2KXS7Q .
an idea, why not to compress/archive the file keeping the permissions and owner intact then transfer decompressed on the fly?
I have started sponsoring the fix with transfers sometimes getting root.root 600 ownership/permissions. The fix attempted on previous commits was not doing the job and I have created a list of TODO based on suggestions I have read from a number of people.
It is currently in testing phase where files are transferred to TMP folder first with a unique name, permissions and ownership applied and then file MOVED into final destination folder.
There are still some issues with newly created files and this bug is being looked into too.
Hostname detection was also a problem when a host has several names and this has been fixed and tested (otherwise it required to use -N hostname switch when cluster hostnames did not match main hostnames).
It may take a week or two more to polish these fixes before we start implementing multithreading (to send file changes to all hosts at the same time and not one host at a time which is slow when there are many hosts with many files to sync).
Preview of work done can be seen at https://github.com/Shotaos/csync2
Please keep all suggestions and comments on LINBIT project (not on Shota) as all code improvements will the published on LINBIT git once I see satisfactory results for permissions, ownership, hostname and multithreading (which will be tested on 4 node setup with 100k files each) for everyone to test and improve.
some comments with my "I know my way around csync2 code, I've been there before and it did not make me happy" hat on: The underlying problem is one of the design bugs in csync2: each aspect in its own sub-command, but no "transactions" or other way to correlate these, as well as the typical sequence of
Because of that, there are races where a new file would be created (after patch) but not yet with the final ownership/mod, and if you have "intotify" triggered sync actions back, you may even see the too-new-timestamp, and auto-resolve policies for conflicts might then chose the "unfinished" root:root 0600 file. What I think is needed is either a "transaction" { begin, commit }, so the receiving csync2 knows explicitly to group the above operations on the same "result" file before "commiting" it into the final place, or a new "Xpatch" command (I'd prefer that) which can communicate all of the mentioned aspects in one exchange, so the receiving csync2 can prepare the temp file with all its aspects before renaming it into the final place. compat with older csync2 should not be neglected.
On the sending side, the relevant code is in csync_update_file_mod in update.c (I think), the receiving side in csync_daemon_session would need an extra entry in the cmdtab for the "Xpatch" (or whatever you want to call it). A "hack" could be to only "anticipate" the following setown/setmod/settime commands in csync_rs_patch, and handle them there, just before the rename(newfname, prefixsubst(filename)), but somehow handle if some of those don't come. Always remember that the connection could be interrupted at any time, so you want to avoid ending up with unusable content in the final place. (Not that the original authors of csync2 cared too much about that, in the original implementation, you could end up with a truncated to 0 file in the final location, and if your tempdir is on a different file system, that is still in the fallback code path today!)
I have some unfinished patches floating around since years that try to implement acl support properly, or "nanosecon timestamps", or above sketched transaction, or rework the internal database structure with recursive common table expressions, ... but it never hurt enough to actually make those work.
Thank you, will take your notes into consideration when making changes. Would you be able to share the uncommitted files you have mentioned?
Have some improvements made:
Add hostname guessing feature in simple mode Add atomicpatch command to the protocol (-a switch) Add -a flag to activate atomicpatch Fix missing commas in struct Fix command argument printing to only include existent arguments Fix bug in command argument printing to check for empty, not NULL, string Add conformance to gnu90 std Fix the timestamp bug Hide debug logging behind -O flag
Was testing for a while and in a 4 node setup it transfers files correctly with correct timestamp and ownership,
There are 7 files changed but have no write permissions. Could push commit for review and wider testing.
Current work is on making Csync2 multithreaded. That is - to send files to all nodes at once, not one by one. Currently it takes 5 seconds on 1 node sending. So 4 nodes take 20-30 seconds. And it grows with number of nodes.
It was not designed to work in thread safe way and that is a challenge to change that.
Completely fixed ownership and timestamp problem. The only minor thing missing - directory mtime preservation. Will make it in the next fixes. Will leave multithreading for now as that requires database change. Will work on making csync2 faster with file scans, less connections and actually do the work very quickly without going with multithreading. Made a PULL REQUEST for latest changes.
There is a condition which causes files in the destination server to be owned by root instead of the correct user. It has been discussed on the mailing list and one hypothesis is here:
http://lists.linbit.com/pipermail/csync2/2017-September/000080.html
This is a pretty serious bug for us since we run a WebDAV server cluster with csync2 and lots of small file operations to servers across the cluster at the same time.