giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

We should keep AppCatalogEntries for all apps that are in use #1441

Open marians opened 2 years ago

marians commented 2 years ago

User story

Details

In the web UI we offer details for every app installed in a workload cluster. Example:

image image

Here, in the dropdown labelled as INSPECT VERSIONS, we display details from the AppCatalogEntry resource, for the currently installed version as well as for other available versions.

UX problem

In the case of an app version being installed that has no according AppCatalogEntry, this information is missing. Example:

image

Also by navigating into the Apps section, a user can no longer find any details about the installed version 1.16.1. The user also cannot find out which releases existed since the installed one and the latest one, and according an upgrade to the lastest v1 minor is not possible.

Solution

marians commented 2 years ago

@gianfranco-l This appears to be quite severe to me. I'd be happy if honeybadger could take a look.

gianfranco-l commented 2 years ago

it sounds like some tech debt is showing up in the web UI: @piontec will describe the root cause of the problem, the long term solution and the workaround that he sees and we will discuss them async with @giantswarm/team-honeybadger . @uvegla will drive this and ensure that this is taken care of and implemented.

uvegla commented 2 years ago

A couple of things we talked about / I have in mind off the bat:

uvegla commented 2 years ago

I feel that by the nature of App Platform we can only come up with an eventually consistent solution.

Some facts:

Some thoughts:

Some ideas

piontec commented 2 years ago

Some long-term perspective, mainly for @marians. When we first designed ACEs, we made an assumption that we represent each release as a separate ACE, so it's easy to manage (add/remove) them without changing too many objects at once (and with super easy logic). This has come back and hit our back, as in general we would produce too many ACEs and kill the API server performance. Because of that, we decided to limit the number of ACEs to the X most recent releases (currently, AFAIR, X = 5). This has some serious side effects as well, as you see now with happa.

As a result, we know we have to rework how ACEs are managed, but we really don't have time to work on it. Hence, we're looking for some solution here.

piontec commented 2 years ago

From my point of view, mostly based on the comments above, I see these 2 solutions:

  1. Multi-finalizers

When a per-cluster/org app-op creates an instance of App CR (a Chart CR on a remote cluster), it adds its own finalizer to the ACE. It also adds info about how many times this specific app-op has deployed this app. If the counter ever reaches 0, this app-op removes its own finalizer. Now, the unique app-op, when it finds it fit to remove the specific ACE, just issues delete and that's it. When usage counters of all the app-ops will drop to 0, all app-ops will remove their finalizers and the object will be removed from k8s API server.

Pros:

Cons:

Ideas:

  1. Introduce a new CR "done right", tentative name CatalogEntry

This would create a CatalogEntry per application name, but not version. For each application, versions and their properties will be held within a single CE as a list of entries based on the current status of ACE. Entries on the list always reflect what is in the Catalog.

One more thing we can do right if we go with this, is we can extract this to a separate operator and use kuebuilder for it already. This will limit the difference in how unique and non-unique app-ops work, it will start a migration from operatorkit that we need to drop anyway and will be a good intro for us for kubebuilder anyway. It will also make it easier to potentially replace or modify heavily App CRs and app-op in the future.

We risk here that we miss some even-bigger-scope of app-platform, that might show up after the app-platform review, but I'm OK to proceed with it because:

All in all, I think I'm for option 2, as it's more "right" and "long-term", but I'm also assuming that implementation of "option 2" is not much bigger than "option 1" anyway. Do you agree? I see them both as pretty big now :(

Also, @uvegla let me know if you agree that these 2 options make the most sense... maybe scanning all the App CRs when you want to "just" remove an ACE is not that bad. I'm not sure about operatorkit, but in kubebuilder client's request are definitely automatically cached.

uvegla commented 2 years ago

@piontec I like option 2 the most, but it also takes the most effort.

FTR this is what I came up with after digging around:

Additionally the current clean up logic should be modified to keep ACEs with finalizers.

Some corner cases:

This is not perfect tho, for example:

This is all a little complicated tho. @marians what is the urgency on this? If not super urgent, I would rather go with the decouple into a new operator and we do it when we get there. WDYT?

uvegla commented 2 years ago

My other thought is that letting multiple app-operators updating labels, list, counters whatever could end up in a serious, non-deterministic mess cos there is no guarantee whatsoever when / what order your writes get processed and if the resource changed between the read and the write in a single operator instance's perspective.

marians commented 2 years ago

There is no particular urgency on this. It's just a general problem with our app user experience.

ljakimczuk commented 2 years ago

My 2 cents. I may be controversial, but maybe we should wait with implementing something bigger till we have more time to do the architecture review and re-design. I mean, I like the 2nd idea because it gives the most holistic view on what's inside the Helm repositories, but also I don't like it because it feels like a lot of fuss to get the app metadata from repository. In such case I like the 1st option better, as it feels even more Kubernetes-like, I mean the finalizers, than the operator. But it also comes with its potential problems and I get it.

My other controversial thought is, do we need the ACE for installed apps at all? I mean, we get 5 latest ACEs created by unique app-operator and that's fine, we need to know beforehand what is offered from the catalog. Now, because the view on the screenshot presents the WC-scoped view, I presume the drop-down list contains the version of the app installed for this workload cluster, plus the versions available coming from ACEs, is that right @marians ? Meaning this list does not have the versions installed for other workload clusters, right? If this assumption is right, do we need to create the ACE for such, already installed app? App Operators should already have access to the index.yaml, see for example this part, and hence to all the information needed on the list. So, let's cut to the chase - can't we put this info into the App status? Better yet, the status already has the appVersion corresponding to upstream version on Happa's list, and the version I presume is already got from there, so we would need to add the release date only. If other information is needed we could also drop it there, after all ACE contains mostly the same fields as App CR.

uvegla commented 2 years ago

I kinda feel we should wait with this too. I think adding the info the App status is interesting, tho this way you kinda replicate the info all over the place.

On the other hand the Catalogs and the ACEs are also used in Happa for browsing the available Apps - /apps endpoint - and checking their READMEs, for example /apps/giantswarm/flux-app/0.15.1 which are nice features. Maybe those could also be solved in another way but having Catalogs and ACEs in some form in the cluster is nice to some extent, cos I dont have to "go away" from the cluster / know where these are coming from to see what is available (at least for now the last 5, well...).

marians commented 2 years ago

I just added info on another broken use case "As a user I want to view the entire app release history in the web UI." to emphasize that the catalog detail page's function to allow viewing the entire release history is also broken.

piontec commented 2 years ago

I think that we agree that we need a major change here and we won't be able to execute it anytime soon. As such, I really like the simple solution @ljakimczuk proposed: let's add the missing info to the App CR itself, on its creation. That way happa has all it needs and we won't have to do any big changes. Yes, we're replicating some info and potentially risking some inconsistencies, but I think we can live with that.

As for the full story of 'release history in happa', this will have to wait for the bigger rework, I'm afraid.

marians commented 2 years ago

Sounds good to me

weatherhog commented 1 month ago

@marians as this seems to be happa related, is this something that is still valid for the future and backstage? Or can this be closed?

marians commented 1 month ago

After glancing over it, I think this is going to remain valid once we port the function to upgrade an installed app to Backstage.