Open Siborgium opened 1 year ago
Hi!
This PR is a mess I have to apologize for. Unfortunately, rebasing with proper history rewrite is not a viable option, because a lot of changes are intertwined across multiple commits.
So this indeed looks rather complicated...
Now the temporary file life is much shorter: the only reason we need temporary file present in filesystem is xdg-mime mimetype detection, so the file is unlinked immediately once it's of no use.
Indeed, that sounds like a good idea. But it would seem that this could be done much simpler, by unlinking the file immediately after figuring out its MIME type and only storing the fd.
Here's a sketch of what it could look like, what do you think?
I've put some effort into protecting it against signals
Right; being robust in case of signals makes this a little bit more complicated.
I'm convinced we could replace xdg-mime with gmime or another mimetype library (there are too few admittedly), eliminating the need to create in-filesystem temporary files.
In https://github.com/bugaevc/wl-clipboard/pull/97, there was an idea to use file(1)
for this. It should be possible to invoke it as file -i -
, in which case it reads its stdin, so we don't have to have a file on the filesystem for it. Then we could event attempt to use create_anonymous_file()
(i.e. memfd
) for this temporary file too.
So this indeed looks rather complicated...
Changes themselves are not very hard to grasp.
sendfile
over cat
if possible) unlinking the file immediately after figuring out its MIME type and only storing the fd
Yes, this is what my PR does, except we don't even need the fd if we mmap the file. mmap
greatly simplifies fd bookkeeping. Using a mmap/buffer to copy from instead of fd also allows to merge to code paths for buffer and fd.
what do you think?
This is roughly what I came up with initially. However, the current architecture was too hard to tweak, extend and reason about -- e.g. when clearing resource leaks, hence the incapsulation effort.
there was an idea to use file(1) for this
xdg-mime
already does this and a lot more. It's quite complicated admittedly, which is why it can't be replaced with libraries or file
alone without losing quality. If we're ok with losing quality, mimetype libraries are superior in my opinion.
If we're running on Linux, and user_namespaces feature is enabled, one could mount tmpfs over /tmp in their own namespace, dumping the file there. The file would be invisible but to the child processes (xdg-mime). This is nice enough, but too many "if"s, and it's out of the scope of this PR.
Try to avoid forks where possible without extra effort (e.g. use
sendfile
overcat
if possible)
The idea with using cat
was exactly to rely on someone else having figured out how to determine the availability and applicability of various ways of copying (copy_file_range
, splice
, sendfile
, whatever other non-Linux systems may have) and just defer to their implementation.
Have you actually found a case where calling sendfile
directly (over forking off a cat
) makes wl-copy
run noticably faster, or is this a theoretical speedup?
xdg-mime
already does this and a lot more. It's quite complicated admittedly
Indeed, I can see that it calls file
in the fallback handler (info_generic
).
It does seem to be complicated, though not complex: it essentially determines whether it's running under KDE (in which case it wraps kmimetypefinder
) or GNOME (or Cinnamon, or LXDE, or MATE, or Xfce), in which case it wraps gio info
, or wraps file
as a fallback. It wouldn't be so hard to reimplement the same logic in wl-clipboard if we have to, though of course using a ready-made xdg-mime
script is very handy.
But, there's a need to know if the file is textual or not (and if it is, make a guess at its encoding), to conditionally add text/plain
to the list of offered formats. As discussed in https://github.com/bugaevc/wl-clipboard/pull/97, this is easy with file
, and does not seem to be available with xdg-mime
.
which is why it can't be replaced with libraries or
file
alone without losing quality. If we're ok with losing quality, mimetype libraries are superior in my opinion.
Do you actually have an example of where xdg-mime
fares better then file
?
I do have a reverse example: on SSH public keys, xdg-mime
(at least with its gio open
backend) gets confused into thinking they're MS Publisher documents due to the .pub
extension, while file
just says it's plain text:
$ file --mime --brief .ssh/id_rsa.pub
text/plain; charset=us-ascii
$ xdg-mime query filetype .ssh/id_rsa.pub
application/vnd.ms-publisher
That all being said, I don't actually think that it's that important to infer MIME type in complex cases correctly. Really, the data types that typically get placed into a clipboard are:
application/x-yaml
or application/pgp-keys
or text/x-python3
or text/x-kotlin
or text/x-csrc
) does not really matter, what matters is that it's just text;application/x-libreoffice-internal-id-147670
) that we would not find in a file anyway.So while it's surely preferable to handle exotic formats, like say an ISO image or a Blender .blend
file, correctly, these are not the types you'd normally find in a clipboard, and none of the apps are really going to know how to paste them. Note that copying/pasting "a file" in a file manager is done very differently, by basically placing its path/URL (and not its actual contents) into the clipboard.
As for using libraries, I'd rather avoid that if possible. wl-clipboard tries to be very light on dependencies — it doesn't really require anything other than libwayland-client (and a libc). While it wouldn't be hard to make a library an optional dependency, just relying on an external tool to do the right thing is nicer still.
If we're running on Linux, and user_namespaces feature is enabled, one could mount tmpfs over /tmp in their own namespace, dumping the file there.
That sounds like a huge overkill for what we're trying to achieve.
A huge lot of simple command-line programs use tmpfile
/mk*tmp
for storing temporary files, but I don't think I've ever heard of any that uses namespaces to hide that file from everyone else. This may be beneficial to do for system daemons (and quite easy to set up with systemd's PrivateTmp
option), but not for small utilities. That being said, nothing prevents you from making a little wrapper for wl-copy
that would unshare CLONE_NEWUSER|CLONE_NEWNS
, remount /tmp
, and then exec wl-copy
.
However, the current architecture was too hard to tweak, extend and reason about -- e.g. when clearing resource leaks, hence the incapsulation effort.
I'm not at all asserting that the current architecture is perfect (although it sure took a massive refactoring effort to get there from the wl-clipboard 1.0 codebase), but it wasn't hard at all to make the above tweak (copy_action->file_to_copy
→ copy_action->fd_to_copy_from
). That would probably get somewhat messier with signal handling (which we would not need should we implement the fd-only model), but that still looks quite doable from where I'm standing.
Changes themselves are not very hard to grasp.
I understand this is not trivial, but maybe if you do split your work up into proper incremental commits it would be more approachable...
Plus, some things just cannot go in as is. To name a few things:
<>
s, even if you add an internal directory to the include path — and for a simple enough project like wl-clipboard, you should not have to do that either;foo++
) in for
loops;void *ptr
(with the asterisk attached to the pointer name);/// @brief Begin "sensitive" section
/// @param[in] handler Emergency handler, invoked on signal
/// @return 0 on success, negative value otherwise
//
) at all, other than sometimes to specify what an include is needed for, e.g.
https://github.com/bugaevc/wl-clipboard/blob/89cd9f9e430a6cd8b82fa65a9c05e530e2430ffd/src/util/files.c#L30-L34
In fact there are rather specific rules of how comments are to be formatted; please take a look at the existing codebase for examples.Anyway, please don't take this the wrong way, I'm grateful for your contribution and for this conversation we're having :slightly_smiling_face:
This PR is a mess I have to apologize for. Unfortunately, rebasing with proper history rewrite is not a viable option, because a lot of changes are intertwined across multiple commits. I suggest you squash it instead of simply merging/rebasing.
The problem I initally wanted to solve is
wl-copy
leaking clipboard contents when terminated. This occurs e.g. when doing C-c on hungwl-copy
spawned from Kakoune text editor (see Issues). On my way I discovered a lot more ways to break stuff here and there, so it ended up like this.Now the temporary file life is much shorter: the only reason we need temporary file present in filesystem is xdg-mime mimetype detection, so the file is unlinked immediately once it's of no use. I've put some effort into protecting it against signals, but it's just bare minimum as I'm convinced we could replace xdg-mime with gmime or another mimetype library (there are too few admittedly), eliminating the need to create in-filesystem temporary files.
Other than that it's mostly refactoring to make the code easier to understand & reason about.