anydistro / bxt

Next generation repository maintenance tool (WIP)
GNU Affero General Public License v3.0
0 stars 4 forks source link

Compare request #56

Open fhdk opened 3 months ago

fhdk commented 3 months ago

Related to #22

When a compare request is posted - the request contains a json body with a list of boxes to compare.

[
  {
    "branch": "unstable",
    "repository": "core",
    "architecture": "x86_64"
  },
  {
    "branch": "testing",
    "repository": "core",
    "architecture": "x86_64"
  }
]

The answer only contains boxes known to the database. A short excerpt

{
    "sections":
    [
        {
            "branch": "unstable",
            "repository": "core",
            "architecture": "x86_64"
        }
    ],
    "compareTable":
    {
        "flex":
        {
            "unstable/core/x86_64":
            {
                "sync": "2.6.4-5"
            }
        },
        "logrotate":
        {
            "unstable/core/x86_64":
            {
                "sync": "3.21.0-2"
            }
        }
    }
}

Is this result to be expected if a requested box does not exist?

Would it be an empty list if none of the boxes exist?

[
  {
    "branch": "stable",
    "repository": "core",
    "architecture": "x86_64"
  },
  {
    "branch": "testing",
    "repository": "core",
    "architecture": "x86_64"
  }
]

That one answered itself :grin:

{
    "message": "No compare data found (all sections are empty)",
    "status": "error"
}
LordTermor commented 3 months ago

That one answered itself 😁

So the issue is solved?

fhdk commented 3 months ago

Not quite sure it is an issue - the initial question still stands.

I kind of expected the compare request to return an an empty entry when a entry does not exist in one of the boxes

Comparing unstable and testing - a response along the lines of this was expected

{
    "sections":
    [
        {
            "branch": "unstable",
            "repository": "core",
            "architecture": "x86_64"
        },
        {
            "branch": "testing",
            "repository": "core",
            "architecture": "x86_64"
        }
    ],
    "compareTable":
    {
        "flex":
        {
            "unstable/core/x86_64":
            {
                "sync": "2.6.4-5"
            },
            "testing/core/x86_64":
            {
                "sync": ""
            }
        },
        "logrotate":
        {
            "unstable/core/x86_64":
            {
                "sync": "3.21.0-2"
            },
            "testing/core/x86_64":
            {
                "sync": ""
            }
        }
    }
}
LordTermor commented 3 months ago

Is there a reason to return empty strings for every package out there?

If package is not in a section, the section will not be in table. The separate "sections" array is needed to check that sections do exist in repository, it's just packages are not there in case there were none packages in a section.

fhdk commented 3 months ago

Then it is per design - which was the reason I asked.

Technically - real world - you may have a package in on one box which is not in another.

This goes for a long time repo package which got removed in unstable but still exist in stable or testing or the other way around - a package may exist in unstable but has not yet been moved to testing.

In this case the testing repo didn't have any content - but for the sake of this topic - let's generate an example

How would you list the comparison if you did not include an empty string for each off the packages in unstable which is not present in testing?

Testing contains one package so compared to unstable you would need to create empty strings for all those that do not exist in testing but is in unstable.

With this thougt experiment a comparison table would need to be created - even if it would result in 200 empty strings.

But it would create a result - and for machine to machine communication - the result must compare to the request otherwise it is not correct.

This implies that if one machine ask another machine to compare two boxes - then two boxes must be returned - whether one is empty or not - that is irrelevant to the machine.

fhdk commented 1 month ago

@LordTermor @romangg Can you provide some extra information - details on how the compare response can be parsed.

I just realized that the property holding the version may be different.

While enum logic is a great method of describing an entity - when parsing the returned data it is impossible to guess what a field describes.

So returning the data in a named structure "name":"value" instead of a enum value structure "value":{"anothervalue":"thirdvalue"} would be fantastic.

The enums used is undocumented and the connections is therefore next to impossible to understand.

In this example the version property changes name mid flow - I understand why - it has arrived in the repo by use of another endpoint than the commit endpoint.

{
    "sections":
    [
        {
            "branch": "unstable",
            "repository": "core",
            "architecture": "x86_64"
        },
        {
            "branch": "unstable",
            "repository": "extra",
            "architecture": "x86_64"
        },
        {
            "branch": "unstable",
            "repository": "multilib",
            "architecture": "x86_64"
        }
    ],
    "compareTable":
    {
        "libb2":
        {
            "unstable/core/x86_64":
            {
                "overlay": "0.98.1-2"
            }
        },
        "abseil-cpp":
        {
            "unstable/multilib/x86_64":
            {
                "overlay": "20240116.2-2"
            }
        },
        "flex":
        {
            "unstable/core/x86_64":
            {
                "automated": "2.6.4-5"
            }
        },
        "ruby-getoptlong":
        {
            "unstable/extra/x86_64":
            {
                "overlay": "0.1.1-3"
            }
        },
        "arch-install-scripts":
        {
            "unstable/multilib/x86_64":
            {
                "overlay": "28-1"
            },
            "unstable/extra/x86_64":
            {
                "overlay": "28-1"
            }
        }
    }
}

It does not make sense that the version value can be located in different property named automated and overlay respectively.

While it immediately makes sense to daemon - the developer consuming the API is left completely confused.

Resuming the original question about the content of the returned data.

You argue that there is no need to return data where none is.

I beg to differ slightly - the data returned is supposed to be listed in a comparable manner.

I had to create some error prone enumeration - https://github.com/fhdk/bxtctl/blob/main/tests/60_compare_repo.py

for k, package in enumerate(compare_table.items()):
    pkg = {"name": package[0], "versions": []}
    pkg_versions = package[1]
    for key in pkg_versions.keys():
        if package[1][key] not in pkg["versions"]:
            try:
                pkg["versions"].append({"location": key, "version": package[1][key]["overlay"]})
            except KeyError:
                pkg["versions"].append({"location": key, "version": package[1][key]["automated"]})
    missing = [x for x in table_headers if x not in pkg_versions.keys()]
    for m in missing:
        pkg["versions"].append({"location": m, "version": "-"})
    # sort the version list for presentation
    pkg["versions"] = list(sorted(pkg["versions"], key=lambda x: x["location"]))
    pkg_list.append(pkg)

to be able to create a dataset that can be listed in tabular manner.

{'name': 'abseil-cpp', 'versions': [{'location': 'unstable/core/x86_64', 'version': '-'}, {'location': 'unstable/extra/x86_64', 'version': '-'}, {'location': 'unstable/multilib/x86_64', 'version': '20240116.2-2'}]}
{'name': 'arch-install-scripts', 'versions': [{'location': 'unstable/core/x86_64', 'version': '-'}, {'location': 'unstable/extra/x86_64', 'version': '28-1'}, {'location': 'unstable/multilib/x86_64', 'version': '28-1'}]}
{'name': 'flex', 'versions': [{'location': 'unstable/core/x86_64', 'version': '2.6.4-5'}, {'location': 'unstable/extra/x86_64', 'version': '-'}, {'location': 'unstable/multilib/x86_64', 'version': '-'}]}
{'name': 'libb2', 'versions': [{'location': 'unstable/core/x86_64', 'version': '0.98.1-2'}, {'location': 'unstable/extra/x86_64', 'version': '-'}, {'location': 'unstable/multilib/x86_64', 'version': '-'}]}
{'name': 'ruby-getoptlong', 'versions': [{'location': 'unstable/core/x86_64', 'version': '-'}, {'location': 'unstable/extra/x86_64', 'version': '0.1.1-3'}, {'location': 'unstable/multilib/x86_64', 'version': '-'}]}

This - kind of - defeat the purpose of JSON data which are intended to describe the field by name and content by value(s).

When the property names change any previous learned structures breaks because propery names are introduced from backend enumeration which has no documentation.

I propose a more predictable structure so the consumer of the API don't have to jump hoops to be able to create a listing like this.

Packages                  unstable/core/x86_64    unstable/extra/x86_64 unstable/multilib/x86_64
------------------------------------------------------------------------------------------------
abseil-cpp                                   -                        -             20240116.2-2
arch-install-scripts                         -                     28-1                     28-1
flex                                   2.6.4-5                        -                        -
libb2                                 0.98.1-2                        -                        -
ruby-getoptlong                              -                  0.1.1-3                        -
LordTermor commented 1 month ago

It does not make sense that the version value can be located in different property named automated and overlay respectively.

There are different "candidates" (aka locations) of packages. The only problem here is that

enumeration which has no documentation

Which is a pretty minor problem.

I'll update the docs. The location is one of overlay, sync or automated, any of them can be provided. I should add preferredCandidate like with packages though.

How would you list the comparison if you did not include an empty string for each off the packages in unstable which is not present in testing?

Foo then will be listed in the resulting structure with testing/core/x86_64 while everything else will have unstable/core/x86_64, but the sections field will contain both testing/core/x86_64 and unstable/core/x86_64:

(took your message as an example)

{
    "sections":
    [
        {
            "branch": "unstable",
            "repository": "core",
            "architecture": "x86_64"
        },
        {
            "branch": "testing",
            "repository": "core",
            "architecture": "x86_64"
        }
    ],
    "compareTable":
    {
        "libb2":
        {
            "unstable/core/x86_64":
            {
                "overlay": "0.98.1-2"
            }
        },
        "abseil-cpp":
        {
            "unstable/core/x86_64":
            {
                "overlay": "20240116.2-2"
            }
        },
        "flex":
        {
            "unstable/core/x86_64":
            {
                "automated": "2.6.4-5"
            }
        },
        "ruby-getoptlong":
        {
            "unstable/core/x86_64":
            {
                "overlay": "0.1.1-3"
            }
        },
        "arch-install-scripts":
        {
            "unstable/core/x86_64":
            {
                "overlay": "28-1"
            }
        },
        "foo":
        {
            "testing/core/x86_64":
            {
                "overlay": "version"
            }
        },
    }
}
fhdk commented 1 month ago

:white_check_mark: I will add the possible values to the code.

I have been updating the openapi as well - perhaps I should make a PR with my additions

Just for the record - the property

            "testing/core/x86_64":
            {
                "overlay": "version"
            }

Is it ever possible that more than one value is provided? Or is it always - for any one given combination of branch/repo/architeture - one of overlay, sync, automated?

LordTermor commented 1 month ago

Is it ever possible that more than one value is provided?

Yep, if there are multiple candidates all of them will be presented

I have been updating the openapi as well - perhaps I should make a PR with my additions

Go ahead

fhdk commented 1 month ago

The CompareResponse documetation is vague

    CompareResponse:
      type: object
      properties:
        sections:
          type: array
          items:
            $ref: "#/components/schemas/PackageSection"
        compareTable:
          type: object
          additionalProperties:
            type: object
            additionalProperties:
              type: object
              additionalProperties:
                type: string

It would be great to docoment what is what, and what values can one expect, and how they connect in the larger picture.

LordTermor commented 1 month ago

The CompareResponse documetation is vague

    CompareResponse:
      type: object
      properties:
        sections:
          type: array
          items:
            $ref: "#/components/schemas/PackageSection"
        compareTable:
          type: object
          additionalProperties:
            type: object
            additionalProperties:
              type: object
              additionalProperties:
                type: string

It would be great to docoment what is what, and what values can one expect, and how they connect in the larger picture.

Sure, I already mentioned that

I'll update the docs

previously