fastfetch-cli / fastfetch

An actively maintained, feature-rich and performance oriented, neofetch like system information tool.
MIT License
8.08k stars 349 forks source link

[FEAT] Make guix package manager detection faster #988

Closed Dariqq closed 4 weeks ago

Dariqq commented 1 month ago

While putting fastfetch 2.14 on a slower laptop of mine i noticed that my implementation for package counts for guix is quite slow taking 60-70ms for each of the 3 profiles on my main machine and up to 2-3x that on a slow laptop.

The main cause for this is waiting for guile to return the list of installed packages.

I have taken a look at what is happening behind the scenes:

This manifest files currently look like this:

(manifest
 (version 4)
 (packages
  (("emacs"
    "29.3"
    "out"
    "/gnu/store/1bi7n031g6b5b07hvps3gyi0rhi44qkb-emacs-29.3"
    ;; ...
    )
   ("fastfetch"
    "2.14.0"
    "out"
    "/gnu/store/kamkxrxi8mdcfagfg0z4nfranhfn8csb-fastfetch-2.14.0"
    ;; ...
    )
   ("btop"
    "1.3.2"
    "out"
    "/gnu/store/0j1ya0g65nk613lnq56dyhnv4fk8blvb-btop-1.3.2")
   ;; .....
   )))

(Somtimes there is additional information for a package like search paths, propagated packages or provenance information, etc which I have omitted here for clarity)

All that gets displayed using guix package -I are the entries in the packages list.

Would it be possible to parse this file with C instead? Are there any other examples I could look at which do something similiar?

EDIT: The thing that I tried was using GetNumStrings for "/gnu/store/" but that also catches additional entries from the additional fields that i omitted above. But at least it is a lot faster reducing the total time for packages to 4ms

CarterLi commented 1 month ago

Would it be possible to parse this file with C instead? Are there any other examples I could look at which do something similiar?

If you can make sure that the structure of this file will NEVER change, parsing this file can be easy: just count the number of ( chars

Dariqq commented 1 month ago

If you can make sure that the structure of this file will NEVER change, parsing this file can be easy: just count the number of ( chars

Unfortunately things are a bit more complicated : The full entry for emacs is

("emacs"
 "29.3"
 "out"
 "/gnu/store/1bi7n031g6b5b07hvps3gyi0rhi44qkb-emacs-29.3"
 (search-paths
  (("EMACSLOADPATH"
    ("share/emacs/site-lisp")
    ":"
    directory
    #f)
   ("EMACSNATIVELOADPATH"
    ("lib/emacs/native-site-lisp")
    ":"
    directory
    #f)
   ("INFOPATH" ("share/info") ":" directory #f)
   ("TREE_SITTER_GRAMMAR_PATH"
    ("lib/tree-sitter")
    ":"
    directory
    #f))))

Some packages also specify propagated-inputs, packages that are also added to the profile when the first package is installed such that it is available in various search paths:

("emacs-pdf-tools"
 "1.1.0"
 "out"
 "/gnu/store/bsyir35yn1i0zxdc0qff4zg83mfax9lm-emacs-pdf-tools-1.1.0"
 (propagated-inputs
  (("emacs-tablist"
    "1.1"
    "out"
    "/gnu/store/9ry9pkxw0wcjs5af96z6cb41qxhzhkic-emacs-tablist-1.1"))))

This would be counted when searching for "/gnu/store" which is ok i guess.

The case when package C gets propagated multiple times from different packages inside a top level package seems get handled by marking multiples as "repeated"

The problem is then when package A and B both propagate the same package C.

So if I count "/gnu/store" and subtract the count of "repeated" I still might double count some packages.

Dariqq commented 1 month ago

I think the easiest would be counting all unique "/gnu/store/*" strings (which would allow ignoring the structure of the data).

Dariqq commented 1 month ago

The case when package C gets propagated multiple times from different packages inside a top level package seems get handled by marking multiples as "repeated"

The problem is then when package A and B both propagate the same package C.

On my system and home profile things get correctly marked as "repeated" but on the normal profile not. Investigating further.

So if I count "/gnu/store" and subtract the count of "repeated" I still might double count some packages.

Dariqq commented 1 month ago

This took me a bit longer than expected. I've reported this upstream but don't think the repeated value is something to rely on (as it was just introduced to deduplicate the output). I'll see if if I can just count duplicate entries myself from C using the hash as identifier.

Dariqq commented 1 month ago

@CarterLi

I currently have something like

static uint32_t getGuixPackagesImpl(char* filename)
{
  FF_STRBUF_AUTO_DESTROY content = ffStrbufCreate();
  if (!ffReadFileBuffer(filename, &content))
    return 0;

  // Count number of unique /gnu/store/ paths in PROFILE/manifest based on their hash value.
  // Contains packages explicitly installed and their propagated inputs.
  size_t hash_size = 32;
  char* needle = "/gnu/store/";
  size_t needleLength = strlen(needle);
  uint32_t count = 0;
  char entry [needleLength + hash_size];
  char* iter;
  while ((iter = memmem(content.chars, content.length, needle, needleLength)) != NULL)
    {
      ++count;
      strncpy(entry, iter, needleLength + hash_size);
      ffStrbufRemoveS(&content, entry);
    }

  return count;
}

which adapts the code from getNumStringsImpl and removes duplicate entries using the 32 length hash as identifier. It is still quite fast taking 5-10ms on my pc and about 2x that on my old laptop.

I kind of don't like that I currently strncpy the matching entry out to use it in ffStrbufRemoveS. Do you have a better idea for how to get the substring?

CarterLi commented 1 month ago

I don't get what you are doing. What's the content of filename?

Dariqq commented 1 month ago

filename is the profile manifest with content like Idescribed earlier: https://github.com/fastfetch-cli/fastfetch/issues/988#issue-2331222619.

The problem is that if multiple packages declare the same package as a propagated-input (a dependency that gets also installed when installing a package) this file would contain duplicate entries. Currently this problem is ignored as guix package -I does not count these propagated inputs.

All these packages are referenced by their paths in the store at /gnu/store/[hash]-package-version-output where the hash is unique for this package at a state in time.

what i want to do is get the number of unique /gnu/store/[hash] entries in this file.

The loop finds the first "/gnu/store" entry in content, adds 1 to the count, copies the /gnu/store/hash into entry and then removes all /gnu/store/hash occurrences from content destructively. So in the next iteration the next match point to a new entry with a new hash.

CarterLi commented 1 month ago

A simple implimentation can be:

  1. add all hash string into a list. Since all hash strings are fixed-length, FFstrbuf can be used as a list.
  2. sort the list. qsort
  3. count unique entries. memcmp adjacent strings.
CarterLi commented 1 month ago

You may upload the file and I will have a try.

Dariqq commented 1 month ago

manifest.txt Thanks for the hints, I'll see if I can come with something. Here is short example file of some emacs packages if you want to try as well. The expected answer would be 9.

To get hings workings properly , you'd also need to adjust either GetGuixPackages or GetGuixPackagesImpl to point at the manifest file directly which is at e.g. at "/run/current-system/profile/manifest" (Not sure where it would be more more appropriate).

CarterLi commented 1 month ago
static int compare32(const void* a, const void* b)
{
    return memcmp(a, b, 32);
}

static uint32_t getGuixPackagesImpl(char* path)
{
    FF_STRBUF_AUTO_DESTROY content = ffStrbufCreate();
    if (!ffAppendFileBuffer(path, &content))
        return 0;

    char* pend = content.chars;

    for (const char* pattern = content.chars; (pattern = strstr(pattern, "/gnu/store/")); pattern += 32)
    {
        pattern += strlen("/gnu/store/");
        memmove(pend, pattern, 32);
        pend += 32;
    }

    if (pend == content.chars)
        return 0;

    qsort(content.chars, (size_t) (pend - content.chars) / 32, 32, compare32);

    uint32_t count = 1;
    for (const char* p = content.chars + 32; p < pend; p += 32)
        count += memcmp(p - 32, p, 32) != 0;

    return count;
}

Try this please.

CarterLi commented 1 month ago
static int compare32(const void* a, const void* b)
{
    return memcmp(*(const char**) a, *(const char**) b, 32);
}

static uint32_t getGuixPackagesImpl(char* path)
{
    FF_STRBUF_AUTO_DESTROY content = ffStrbufCreate();
    if (!ffAppendFileBuffer(path, &content))
        return 0;

    FF_LIST_AUTO_DESTROY hashes = ffListCreate(sizeof(const char*));

    for (const char* pattern = content.chars; (pattern = strstr(pattern, "/gnu/store/")); pattern += 32)
    {
        pattern += strlen("/gnu/store/");
        *(const char**)ffListAdd(&hashes) = pattern;
    }

    if (hashes.length == 0)
        return 0;

    ffListSort(&hashes, compare32);

    uint32_t count = 1;
    for (uint32_t i = 1; i < hashes.length; ++i)
    {
        count += memcmp(
            *FF_LIST_GET(const char*, hashes, i - 1),
            *FF_LIST_GET(const char*, hashes, i),
            32) != 0;
    }

    return count;
}

Also this. May be faster than previous impl.

Dariqq commented 1 month ago

Wow you are fast.

Both seem to do the right thing in about the same time.

Should the argument of getGuixPackgesImpl be renamed to filename instead of path ?

I needed this patch to pass the manifest file to getGuixPackgesImpl:

diff --git a/src/detection/packages/packages_linux.c b/src/detection/packages/packages_linux.c
index e3178d0e..a583dff8 100644
--- a/src/detection/packages/packages_linux.c
+++ b/src/detection/packages/packages_linux.c
@@ -436,6 +436,7 @@ static uint32_t getGuixPackages(FFstrbuf* baseDir, const char* dirname)
 {
     uint32_t baseDirLength = baseDir->length;
     ffStrbufAppendS(baseDir, dirname);
+    ffStrbufAppendS(baseDir, "/manifest");
     uint32_t num_elements = getGuixPackagesImpl(baseDir->chars);
     ffStrbufSubstrBefore(baseDir, baseDirLength);
     return num_elements;
CarterLi commented 1 month ago

Up to you

CarterLi commented 1 month ago

Off topic: does the initsystem module crash in your PC?

Dariqq commented 1 month ago

Off topic: does the initsystem module crash in your PC?

No, neither on 2.14 nor current dev HEAD (but it is not systemd so things might be different there)

Dariqq commented 1 month ago

Up to you

Should i make the PR or do you commit it directly? Also which version of your patch (i am guessing the second one)?

CarterLi commented 4 weeks ago

I actually prefer the first one. Please make a PR after you tested it carefully.