Closed Vladimir-csp closed 3 months ago
The cache seems to half execution time on all my machines, but there's a bug on main with the entry ID validator where many valid(?) entries are being skipped loudly That seems to be the reason for most of the time save since my last measurement
I've been trying to write comprehensive tests for the functions, and adding another complex code path gives me pause
The idea is great, don't get me wrong, but maybe it could be implemented in a different way?
Maybe somehow reusing the logic that already exists around the xdg-terminals
folder...
I'll keep thinking about it and open an issue if I come up with something clever
there's a bug on main with the entry ID validator where many valid(?) entries are being skipped loudly
Did some tweaks on validators, did it help? What kind of entries are getting rejected?
Off topic: There's movement: https://gitlab.freedesktop.org/terminal-wg/specifications/-/merge_requests/3 @fluvf @mkhl @FreeSlave If you have some input, please share.
Did some tweaks on validators, did it help? What kind of entries are getting rejected?
All entries that have a -
in the ID, can't remember off the top of my head if that's an illegal character
Off topic: There's movement: https://gitlab.freedesktop.org/terminal-wg/specifications/-/merge_requests/3 @fluvf @mkhl @FreeSlave If you have some input, please share.
Some quick thoughts, unfortunately I don't have a gitlab account, so I'll leave them here:
I agree that xdg-terminals
dir support could be dropped, or if necessary replaced with an env variable that controls the subdir searched
I don't think backwards compatibility should be a concern when moving to 1.0
I also don't think keeping it for slow machines should be a priority. Even if it was, my old laptop now execs the script in ~0.6 secs (faster with the cache), which is plenty fast
I agree that first run slow, subsequent runs fast is what the script should aim for, which will be solved by caching
About caching Based on some simple tests on my laptop the bottleneck is opening files, rather than reading them once opened That in mind, I don't think it's worth it to create a custom file format to store all the used information to exec a terminal outright Maybe a simple
if sha1sum -cs $CACHE_FILEPATH; then
get_path_from $CACHE_FILEPATH
contains $ENTRY_IDS $extracted_path # ID list before found fallback entries
read_entry_path $extracted_path || fallback
else
find_entry
sha1sum $entry_path > $CACHE_FILEPATH
fi
where $CACHE_FILEPATH contains
somehash /path/to/entry/file.desktop
Maybe there's an even simpler way, but this shouldn't require big code changes
if sha1sum -cs $CACHE_FILEPATH; then
Getting checksum of a single entry content leaves out other entries and configs. Configs can change, terminals be installed, removed. That is why I sum listing of everything potentially relevant.
Which is faster, reading a single file or listing a dir where file resides?
How slow is this line of the script on your machine? LANG=C ls -LRl ${hash_paths} 2> /dev/null | md5sum 2> /dev/null
. If cached run is faster, then fast enough it seems.
All entries that have a - in the ID, can't remember off the top of my head if that's an illegal character
-
is legal, should work and it works on my system, tested with dash
and bash
as sh
.
fortunately I don't have a gitlab account
It supports delegating a github account.
if sha1sum -cs $CACHE_FILEPATH; then
Getting checksum of a single entry content leaves out other entries and configs. Configs can change, terminals be installed, removed. That is why I sum listing of everything potentially relevant.
Good point!
The used config(s) could be checked and hashed aswell
Adding and checking them using sha1sum -c
is simple enough
Realistic worst case would mean hashing one, maybe two additional files
I don't think the script should care about the status of other entries
Which is faster, reading a single file or listing a dir where file resides? How slow is this line of the script on your machine? LANG=C ls -LRl ${hash_paths} 2> /dev/null | md5sum 2> /dev/null. If cached run is faster, then fast enough it seems.
I'm less concerned with speed and more with the complexity
Writing tests that would catch all cornercases with the implementation seems, hard, as ls
of directories can produce all sorts of output
File ownership changes, or timestamps, or simply (un)installing additional irrelevant applications would all invalidate the cache
The only cornercase I can foresee with my proposal is (un)installing terminals with an empty config, where the chosen terminal would've changed based on their alphabetical sorting order This would have to be documented, but a fix seems unnecessary
Off topic: There's movement: https://gitlab.freedesktop.org/terminal-wg/specifications/-/merge_requests/3 @fluvf @mkhl @FreeSlave If you have some input, please share.
I shared there https://gitlab.freedesktop.org/terminal-wg/specifications/-/merge_requests/3#note_2193993
Adding and checking them using
sha1sum -c
is simple enough
This would actually read files. Both when creating and checking the cache.
Writing tests that would catch all cornercases with the implementation seems, hard, as
ls
of directories can produce all sorts of output File ownership changes, or timestamps, or simply (un)installing additional irrelevant applications would all invalidate the cache
Yes, this method of invalidation is quite sensitive and may be prone to false positives, but this way it's content-agnostic and does not read any files. In order to differentiate entries of terminals, one has to read them. Suppose a new terminal is installed, how would you invalidate the cache relying on *sum -c
if previous cache knows nothing about new file.
ls -LRl
is a POSIX-compliant command, expected to be alphabetical, and is done in a single call for everything. Specific output quirks do not matter, as long as the output is repeatable. The only two plausible false positive cases I can think of are:
How often that happens compared to launching terminals?
... and does not read any files.
It still accesses the disk, but again, I don't think that should be the deciding factor I'm concerned about the implementations complexity and maintainability
Suppose a new terminal is installed, how would you invalidate the cache relying on *sum -c if previous cache knows nothing about new file.
I still don't understand why the script should care about this, if the configs haven't changed?
I think following the suggestion made by meven would be an improvement regardless You can even check the modification times of app folders to see if any entries have been added / removed
I still don't understand why the script should care about this, if the configs haven't changed? You can even check the modification times of app folders to see if any entries have been added / removed
applications
, so their change would not affect mtime of applications
.applications
.I've thought about this and have a, hopefully more well informed, idea
It doesn't matter if the config files have changed, only the resulting ID list is important
Removal of random entries doesn't matter, only the removal of the cached entry and newly added / modified entries are important
The cache file, at the very least, has to store information about which entry and list of IDs were used during the previous run
EDIT: And the value of $XDG_CURRENT_DESKTOP
also has to be stored and checked
So the logic could be:
find_cached_entry || find_entry || exit
Maybe one that still calls find_entry
internallyfind -newer ... -o -path ...
Maybe move the find
call to a function, or prepend the values to it using alias find='...'
With this most of the existing code / functions could be reused
I doubt this would result in egregious amounts of disk access
Best case it's 3 file reads, whatever find
does, and writing of the cache
Thoughts? I could write a working mock-up later, if you don't have time to make one yourself
About reading the cache file, the script could use source
to make it simpler
There should also be something to check permissions, so random users can't inject shell code through it
umask 0022
should be enough
And the value of $XDG_CURRENT_DESKTOP also has to be stored and checked
Yeah, I've added it
I could write a working mock-up later, if you don't have time to make one yourself
Yes, thank you, I do not fully get it, so a mock-up would be very helpful.
About reading the cache file, the script could use source to make it simpler There should also be something to check permissions, so random users can't inject shell code through it umask 0022 should be enough
If user has wider umask than 0022 during application launching, something is probably wrong. OK, I will even pinch it down to 0077 during cache write for good measure.
Source is to be considered, but currently cache read is a rather strict and fail-early type of algorithm.
Speed up things by caching selected
Exec
,ExecArg
.Invalidate the cache when something changes.
@fluvf, please take a look at this. What results do you get on subsequent runs with clean configs and
XTE_CACHE_ENABLED=true
?