bugaevc / wl-clipboard

Command-line copy/paste utilities for Wayland
GNU General Public License v3.0
1.62k stars 60 forks source link

Shorten temporary files lifetime, various fixes/refactoring #155

Open Siborgium opened 1 year ago

Siborgium commented 1 year ago

This PR is a mess I have to apologize for. Unfortunately, rebasing with proper history rewrite is not a viable option, because a lot of changes are intertwined across multiple commits. I suggest you squash it instead of simply merging/rebasing.

The problem I initally wanted to solve is wl-copy leaking clipboard contents when terminated. This occurs e.g. when doing C-c on hung wl-copy spawned from Kakoune text editor (see Issues). On my way I discovered a lot more ways to break stuff here and there, so it ended up like this.

Now the temporary file life is much shorter: the only reason we need temporary file present in filesystem is xdg-mime mimetype detection, so the file is unlinked immediately once it's of no use. I've put some effort into protecting it against signals, but it's just bare minimum as I'm convinced we could replace xdg-mime with gmime or another mimetype library (there are too few admittedly), eliminating the need to create in-filesystem temporary files.

Other than that it's mostly refactoring to make the code easier to understand & reason about.

bugaevc commented 1 year ago

Hi!

This PR is a mess I have to apologize for. Unfortunately, rebasing with proper history rewrite is not a viable option, because a lot of changes are intertwined across multiple commits.

So this indeed looks rather complicated...

Now the temporary file life is much shorter: the only reason we need temporary file present in filesystem is xdg-mime mimetype detection, so the file is unlinked immediately once it's of no use.

Indeed, that sounds like a good idea. But it would seem that this could be done much simpler, by unlinking the file immediately after figuring out its MIME type and only storing the fd.

Here's a sketch of what it could look like, what do you think?

Patch ```patch diff --git a/src/types/copy-action.h b/src/types/copy-action.h index c0a1290..babef84 100644 --- a/src/types/copy-action.h +++ b/src/types/copy-action.h @@ -40,8 +40,9 @@ struct copy_action { /* Exactly one of these fields must be non-null if the source * is non-null, otherwise all these fields must be null. + * The null value for fd_to_copy_from is -1. */ - const char *file_to_copy; + int fd_to_copy_from; argv_t argv_to_copy; struct { const char *ptr; diff --git a/src/types/copy-action.c b/src/types/copy-action.c index f661fc3..1704636 100644 --- a/src/types/copy-action.c +++ b/src/types/copy-action.c @@ -71,7 +71,7 @@ static void do_send(struct source *source, const char *mime_type, int fd) { /* Unset O_NONBLOCK */ fcntl(fd, F_SETFL, 0); - if (self->file_to_copy != NULL) { + if (self->fd_to_copy_from != -1) { /* Copy the file to the given file descriptor * by spawning an appropriate cat process. */ @@ -82,9 +82,11 @@ static void do_send(struct source *source, const char *mime_type, int fd) { return; } if (pid == 0) { + dup2(self->fd_to_copy_from, STDIN_FILENO); + close(self->fd_to_copy_from); dup2(fd, STDOUT_FILENO); close(fd); - execlp("cat", "cat", self->file_to_copy, NULL); + execlp("cat", "cat", NULL); perror("exec cat"); exit(1); } @@ -97,6 +99,10 @@ static void do_send(struct source *source, const char *mime_type, int fd) { * instead. */ wait(NULL); + off_t rc = lseek(self->fd_to_copy_from, 0, SEEK_SET); + if (rc < 0) { + perror("lseek"); + } } else { /* We'll perform the copy ourselves */ FILE *f = fdopen(fd, "w"); diff --git a/src/wl-copy.c b/src/wl-copy.c index f08cf17..af8cda8 100644 --- a/src/wl-copy.c +++ b/src/wl-copy.c @@ -81,28 +81,13 @@ static void did_set_selection_callback(struct copy_action *copy_action) { } } -static void cleanup_and_exit(struct copy_action *copy_action, int code) { - /* We're done copying! - * All that's left to do now is to - * clean up after ourselves and exit.*/ - char *temp_file = (char *) copy_action->file_to_copy; - if (temp_file != NULL) { - /* Clean up our temporary file */ - execlp("rm", "rm", "-r", dirname(temp_file), NULL); - perror("exec rm"); - exit(1); - } else { - exit(code); - } -} - static void cancelled_callback(struct copy_action *copy_action) { - cleanup_and_exit(copy_action, 0); + exit(0); } static void pasted_callback(struct copy_action *copy_action) { if (options.paste_once) { - cleanup_and_exit(copy_action, 0); + exit(0); } } @@ -235,6 +220,7 @@ int main(int argc, argv_t argv) { /* Create and initialize the copy action */ struct copy_action *copy_action = calloc(1, sizeof(struct copy_action)); + copy_action->fd_to_copy_from = -1; copy_action->device = device; copy_action->primary = options.primary; @@ -257,7 +243,26 @@ int main(int argc, argv_t argv) { if (options.mime_type == NULL) { options.mime_type = infer_mime_type_from_contents(temp_file); } - copy_action->file_to_copy = temp_file; + copy_action->fd_to_copy_from = open(temp_file, O_RDONLY | O_CLOEXEC); + if (copy_action->fd_to_copy_from == -1) { + perror("Failed to open temp file"); + return 1; + } + /* Now, remove the temp file and its + * containing directory. We still keep + * access to the file through our open + * file descriptor. + */ + int rc = unlink(temp_file); + if (rc < 0) { + perror("Failed to unlink the temp file"); + } else { + rc = rmdir(dirname(temp_file)); + if (rc < 0) { + perror("Failed to remove temp file directory"); + } + } + free(temp_file); } /* Create the source */ @@ -291,6 +296,5 @@ int main(int argc, argv_t argv) { while (wl_display_dispatch(wl_display) >= 0); perror("wl_display_dispatch"); - cleanup_and_exit(copy_action, 1); return 1; } ```

I've put some effort into protecting it against signals

Right; being robust in case of signals makes this a little bit more complicated.

bugaevc commented 1 year ago

I'm convinced we could replace xdg-mime with gmime or another mimetype library (there are too few admittedly), eliminating the need to create in-filesystem temporary files.

In https://github.com/bugaevc/wl-clipboard/pull/97, there was an idea to use file(1) for this. It should be possible to invoke it as file -i -, in which case it reads its stdin, so we don't have to have a file on the filesystem for it. Then we could event attempt to use create_anonymous_file() (i.e. memfd) for this temporary file too.

Siborgium commented 1 year ago

So this indeed looks rather complicated...

Changes themselves are not very hard to grasp.

unlinking the file immediately after figuring out its MIME type and only storing the fd

Yes, this is what my PR does, except we don't even need the fd if we mmap the file. mmap greatly simplifies fd bookkeeping. Using a mmap/buffer to copy from instead of fd also allows to merge to code paths for buffer and fd.

what do you think?

This is roughly what I came up with initially. However, the current architecture was too hard to tweak, extend and reason about -- e.g. when clearing resource leaks, hence the incapsulation effort.

there was an idea to use file(1) for this

xdg-mime already does this and a lot more. It's quite complicated admittedly, which is why it can't be replaced with libraries or file alone without losing quality. If we're ok with losing quality, mimetype libraries are superior in my opinion.

If we're running on Linux, and user_namespaces feature is enabled, one could mount tmpfs over /tmp in their own namespace, dumping the file there. The file would be invisible but to the child processes (xdg-mime). This is nice enough, but too many "if"s, and it's out of the scope of this PR.

bugaevc commented 1 year ago

Try to avoid forks where possible without extra effort (e.g. use sendfile over cat if possible)

The idea with using cat was exactly to rely on someone else having figured out how to determine the availability and applicability of various ways of copying (copy_file_range, splice, sendfile, whatever other non-Linux systems may have) and just defer to their implementation.

Have you actually found a case where calling sendfile directly (over forking off a cat) makes wl-copy run noticably faster, or is this a theoretical speedup?

xdg-mime already does this and a lot more. It's quite complicated admittedly

Indeed, I can see that it calls file in the fallback handler (info_generic).

It does seem to be complicated, though not complex: it essentially determines whether it's running under KDE (in which case it wraps kmimetypefinder) or GNOME (or Cinnamon, or LXDE, or MATE, or Xfce), in which case it wraps gio info, or wraps file as a fallback. It wouldn't be so hard to reimplement the same logic in wl-clipboard if we have to, though of course using a ready-made xdg-mime script is very handy.

But, there's a need to know if the file is textual or not (and if it is, make a guess at its encoding), to conditionally add text/plain to the list of offered formats. As discussed in https://github.com/bugaevc/wl-clipboard/pull/97, this is easy with file, and does not seem to be available with xdg-mime.

which is why it can't be replaced with libraries or file alone without losing quality. If we're ok with losing quality, mimetype libraries are superior in my opinion.

Do you actually have an example of where xdg-mime fares better then file?

I do have a reverse example: on SSH public keys, xdg-mime (at least with its gio open backend) gets confused into thinking they're MS Publisher documents due to the .pub extension, while file just says it's plain text:

$ file --mime --brief .ssh/id_rsa.pub
text/plain; charset=us-ascii
$ xdg-mime query filetype .ssh/id_rsa.pub
application/vnd.ms-publisher

That all being said, I don't actually think that it's that important to infer MIME type in complex cases correctly. Really, the data types that typically get placed into a clipboard are:

So while it's surely preferable to handle exotic formats, like say an ISO image or a Blender .blend file, correctly, these are not the types you'd normally find in a clipboard, and none of the apps are really going to know how to paste them. Note that copying/pasting "a file" in a file manager is done very differently, by basically placing its path/URL (and not its actual contents) into the clipboard.

As for using libraries, I'd rather avoid that if possible. wl-clipboard tries to be very light on dependencies — it doesn't really require anything other than libwayland-client (and a libc). While it wouldn't be hard to make a library an optional dependency, just relying on an external tool to do the right thing is nicer still.

If we're running on Linux, and user_namespaces feature is enabled, one could mount tmpfs over /tmp in their own namespace, dumping the file there.

That sounds like a huge overkill for what we're trying to achieve.

A huge lot of simple command-line programs use tmpfile/mk*tmp for storing temporary files, but I don't think I've ever heard of any that uses namespaces to hide that file from everyone else. This may be beneficial to do for system daemons (and quite easy to set up with systemd's PrivateTmp option), but not for small utilities. That being said, nothing prevents you from making a little wrapper for wl-copy that would unshare CLONE_NEWUSER|CLONE_NEWNS, remount /tmp, and then exec wl-copy.

However, the current architecture was too hard to tweak, extend and reason about -- e.g. when clearing resource leaks, hence the incapsulation effort.

I'm not at all asserting that the current architecture is perfect (although it sure took a massive refactoring effort to get there from the wl-clipboard 1.0 codebase), but it wasn't hard at all to make the above tweak (copy_action->file_to_copycopy_action->fd_to_copy_from). That would probably get somewhat messier with signal handling (which we would not need should we implement the fd-only model), but that still looks quite doable from where I'm standing.

Changes themselves are not very hard to grasp.

I understand this is not trivial, but maybe if you do split your work up into proper incremental commits it would be more approachable...

Plus, some things just cannot go in as is. To name a few things:

Anyway, please don't take this the wrong way, I'm grateful for your contribution and for this conversation we're having :slightly_smiling_face: