Azathothas / Toolpacks

Official (pkgforge-edge) Repo 📦📀 & The Largest Collection of Pre-Compiled (+ UPXed) Linux Static Binaries (incl. Build Scripts) & Package Manager (rust) :: https://github.com/pkgforge/soar
https://bin.pkgforge.dev/
MIT License
151 stars 16 forks source link

[Request] Rename sha256 to sha in Metadata files #22

Closed xplshn closed 4 months ago

xplshn commented 5 months ago

Hi, recently, I've been trying to implement duplicates filtering based on the SHA of the binaries, however, the cyclomatic complexity is high due to the fact that I have to implement a way to work with the JSONs from your repo, and a way to work with the JSONs from github, and its quite difficult because the struct has to have two sha fields, and so the only way to get it working is using JANKY and ugly code, it'd all be much simpler for me if the fields matched.

This is how it would look if the field were named "sha" and not "sha256"

package main

import (
    "fmt"
    "strings"
)

func listBinaries() ([]string, error) {
        type binaries struct {
    Architecture string `json:"architecture"`
    SHA          string `json:"sha"`
    Name         string `json:"name"`
        }

    var allBinaries []binaries
    var metadata []binaries

    for _, url := range MetadataURLs {
        if err := fetchJSON(url, &metadata); err!= nil {
            return nil, fmt.Errorf("failed to fetch metadata from %s: %v", url, err)
        }
        allBinaries = append(allBinaries, metadata...)
    }
    filteredBinaries := make([]Package, 0)
    for _, bin := range allBinaries {
        if bin.Architecture == "x86_64" && bin.SHA!= "" {
            filteredBinaries = append(filteredBinaries, bin)
        }
    }
    groupedBySHA := make(map[string][]binaries)
    for _, bin := range filteredBinaries {
        groupedBySHA[bin.SHA] = append(groupedBySHA[bin.SHA], bin)
    }
    var duplicateBinariesNames []string
    for _, group := range groupedBySHA {
        if len(group) > 1 {
            for _, bin := range group {
                duplicateBinariesNames = append(duplicateBinariesNames, bin.Name)
            }
        }
    }

    uniqueNames := removeDuplicates(duplicateBinariesNames)
    return uniqueNames, nil
}

This is how it looks if I were to remove duplicates based on the sha256 and sha fields. The complexity here is way higher, because I am iterating through repos and I not only have to remove the duplicates once, but I also have to make sure to remove duplicates with the "sha" field and not the "sha256".

package main
import (
    "fmt"
    "path/filepath"
    "strings"
)

// listBinaries fetches and lists binary names from the given URLs.
func listBinaries() ([]string, error) {
    var metadata, allBinaries []struct {
        Name    string `json:"name"`
        SHA256  string `json:"sha256,omitempty"`
        SHA     string `json:"sha,omitempty"`
    }

    for _, url := range MetadataURLs {
        // Use fetchJSON to fetch and unmarshal the JSON data
        if err := fetchJSON(url, &metadata); err != nil {
            return nil, fmt.Errorf("failed to fetch metadata from %s: %v", url, err)
        }

        allBinaries = append(allBinaries, metadata...)
    }

    filteredBinaries := make(map[string]string)
    excludedFileTypes := map[string]bool{}

    for _, item := range allBinaries {
        binary := item.Name

        ext := strings.ToLower(filepath.Ext(binary))
        if _, excluded := excludedFileTypes[ext]; !excluded {
            filteredBinaries[binary] = binary
            if item.SHA256 != "" {
                            filteredBinaries[item.SHA256] = binary
                        }
                        if item.SHA != "" {
                            filteredBinaries[item.SHA] = binary
                        }
        }
    }

    uniqueBinaries := make([]string, 0, len(filteredBinaries))
    for binary := range filteredBinaries {
        uniqueBinaries = append(uniqueBinaries, binary)
    }

    return uniqueBinaries, nil
}
Azathothas commented 5 months ago

No. sha256 is there to clarify it's sha256, not sha512 or any other variants. Likewise it's same with bsum, it's written as b3sum to clarify it's b3sum and not any other variant.

Your gocode assumes Sha as sha256 by default. Which is to say, this will create issues and conflicts in the future, if I were to start using sha512, or b4sum. Use a library or clearly define what particular checksum version does your 'sha' indicates, rather than simply just using 'sha' as placeholder value.

Also, the Sha you are getting from GitHub's api is not actual shasum of the files, but rather commits. Unless you mean, using tree path's Sha, in which case you are getting sha1sum and not sha256, https://stackoverflow.com/questions/26203603/how-do-i-get-the-sha-parameter-from-github-api-without-downloading-the-whole-f I played around this over a year ago, I had to manually calculate sha256sums using traditional tools and by downloading files locally because GitHub's api didn't provide it.

I won't be regressing from sha256 back to sha1, if anything I will likely start generating sha512, or maybe something even better/newer. The same goes for bsum.

xplshn commented 5 months ago

I didn't mention it, because I thought it was clear; This is the code for listing binaries. There are no conflicts, I just need something other than the name to tell apart binaries.

The only issue would be that you won't use "sha" instead of "sha256" as the field name. In any case, that's fine. It would just have been way better for me, and you could simply document how and what the JSON fields contain, so that anyone who needs to use the JSONs for creating a program like BigDL or etc can use it. Its not something that would concern users nor bother anyone working with the JSONs, its just a field rename.

xplshn commented 4 months ago

@