[Fleet] Provide a "Known Issues" flyout when a user is issuing an Upgrade action in Fleet

nimarezainia commented 11 months ago

Describe the feature: We had recently had some near catastrophic issues in versions of agent that ideally we could have used the platform to warn against and allow the user to dig deeper in to options available to them. In particular it's good practice to let our users know the caveats they may need to consider when upgrading their agents.

Describe a specific use case for the feature: The use case, in simplest form, is when the user has upgraded their stack to version X and they are embarking on upgrading the agents to version X also. We would like to have a fly-out (or some other suitable design) that would show this user all the known issues we are aware of.

[ ] The agents being upgraded will most probably be at different versions so it would great to show all the known issues that would apply.
[ ] The known issues are all documented in the Fleet releases notes also. So ideally this copy for the fly-out is generated form the same source as the release notes so that it is consistent.
[ ] Please also provide a link to the release notes for further detail

@kpollich @zombieFox

Design Solution

Description: When a customer upgrades one or multiple agents to the latest or newer version, display a Flyout component that includes the version dropdown and the related Known Issues content. The Known Issues content is hidden in an accordion component, closed by default, and it updates the content any time the customer changes the version in the dropdown.

Figma file with the new UI criteria. Reach out to @simosilvestri for implementation support.

New UI criteria

Show a medium size Flyout component to Upgrade the agent version
- Set the Select component to a max-width of 800px
Display the “Know issues” version in the accordion below the version selector.
- Title: Text/X-small/Heading4
- Body: Text/Small/paragraph->Regular

Upgrade multiple agents

Upgrade a single agent

elasticmachine commented 11 months ago

Pinging @elastic/fleet (Team:Fleet)

jlind23 commented 11 months ago

@nimarezainia I do see three options here:

The content is downloaded from our release note webpage which allows us to dynamically change it but won't work for air gapped environment. This is helpful also in the example below: We ship 8.13.0 with a bug in a beat, this bug is not known until the 8.13.0 stack is out and thus this page content needs to be dynamically set otherwise Users will not be aware of this problem until they upgrade to 8.13.1+
The content is embedded at build time, this would solve the air gapped environment problem but does not solve the problem statement above.
A mix of the two above, we try to download the content but fallback to what is embedded if it fails.

@kpollich can you think of any other options?

nimarezainia commented 11 months ago

+1 to the dynamic nature what we need. Just thinking out loud, can this information be placed somewhere in the artifactory directory structure?

zombieFox commented 11 months ago

Sounds like a good idea. Some initial thoughts and questions:

... The known issues are all documented in the Fleet releases notes also.

Is this an example of the documented release notes? https://www.elastic.co/guide/en/fleet/current/release-notes-8.11.1.html How far back would we want to present known issues? (At which version would we not present known issues?)

nimarezainia commented 11 months ago

Sounds like a good idea. Some initial thoughts and questions:

... The known issues are all documented in the Fleet releases notes also.

Is this an example of the documented release notes? https://www.elastic.co/guide/en/fleet/current/release-notes-8.11.1.html How far back would we want to present known issues? (At which version would we not present known issues?)

@zombieFox yes that is a sample Release note.

How far back we go is a good question, this is the first checkbox in the description. For example: if we are upgrading a bunch of agents, the versions they are on is varied and so would be the advisory that needs to be applied. How far back should be determined by the lowest version in that group of agents selected. I am not sure if this is technically possible. if not we should just pick a number, say last 2 or 3 major releases.

jlind23 commented 8 months ago

@formgeist Can I assign this to you in order to get the ball rolling?

formgeist commented 7 months ago

@jlind23 Yeah, all good - I will have a look at getting someone to do a pass on this in the coming weeks.

kpollich commented 6 months ago

Hey @simosilvestri - I spent some time getting some screenshots together + annotating a bit to try and capture the ask here visually. Let me know what you think, and please let me know if I can clarify anything else!

@nimarezainia when you say this:

The agents being upgraded will most probably be at different versions so it would great to show all the known issues that would apply.

Do you mean we'd need to potentially have known issues for "source versions" as well as "destination versions"? I wonder if that kind of logic is needlessly complex here. How regularly do we see issues when upgrading specifically from one version to another? I think the "known issues" are most directly associated with the newer "target version" and that's what we'd be reporting here. I suppose this runs the risk of reporting errors for specific version combinations that might not be present in the selected set of agents (thus adding potential for user confusion), but my assumption is that some extra verbosity here probably isn't going to hurt. Curious for your thoughts on this.

nimarezainia commented 6 months ago

@nimarezainia when you say this:

The agents being upgraded will most probably be at different versions so it would great to show all the known issues that would apply.

Do you mean we'd need to potentially have known issues for "source versions" as well as "destination versions"? I wonder if that kind of logic is needlessly complex here. How regularly do we see issues when upgrading specifically from one version to another? I think the "known issues" are most directly associated with the newer "target version" and that's what we'd be reporting here. I suppose this runs the risk of reporting errors for specific version combinations that might not be present in the selected set of agents (thus adding potential for user confusion), but my assumption is that some extra verbosity here probably isn't going to hurt. Curious for your thoughts on this.

Just to clarify what I meant was that we do allow the user to choose the set of agents to upgrade to a target, so technically the set of agents being upgraded could be at different versions. you are correct it may be needless, as most probably the user has upgraded all the agents previously and they would generally be at the same version level. Also the known issue is really only relevant to the target being upgraded to.

I'm happy if we just stick to highlighting known issues in the target and avoid extra complexity.

simosilvestri commented 6 months ago

That's great! Thank you so much @kpollich!

simosilvestri commented 6 months ago

Hi @nimarezainia and @kpollich, I put together a quick video to walk you through the Known Issues UX enhancement. Please let me know if everything is clear and if you have any feedback. Thanks!

https://github.com/elastic/kibana/assets/162109197/bb7503cc-2049-41f6-bbaf-1e551326a12c

nimarezainia commented 6 months ago

thanks you @simosilvestri - IMO looks great.

simosilvestri commented 6 months ago

Thanks @nimarezainia! @kpollich this is the Figma file with the new UI criteria. Please let me know if you have any feedback.

New UI criteria

Show a medium size Flyout component to Upgrade the agent version
- Set the Select component to a max-width of 800px
Display the “Know issues” version in the accordion below the version selector.
- Title: Text/X-small/Heading4
- Body: Text/Small/paragraph->Regular

kpollich commented 6 months ago

Thanks @simosilvestri looks great to me! I think we can consider the design phase for this work done :)

I'll work on getting it scheduled in an upcoming sprint.

formgeist commented 6 months ago

Thanks @nimarezainia! @kpollich this is the Figma file with the new UI criteria. Please let me know if you have any feedback.

New UI criteria

Show a medium size Flyout component to Upgrade the agent version

Set the Select component to a max-width of 800px

Display the “Know issues” version in the accordion below the version selector.

Title: Text/X-small/Heading4

Body: Text/Small/paragraph->Regular

@simosilvestri Can you please add this link and some description of the design solution that will be implemented? That makes it easier to understand the final implementation. Thanks! 👍

kpollich commented 5 months ago

We'll need to engage with @elastic/webteam to get some new content around known issues added to https://www.elastic.co/api/product_versions, or we'll need to block this on the early agent releases work to add a new API endpoint related to agent, making sure to include known issues data along with each agent version.

I think it probably makes the most sense to block this on https://github.com/elastic/ingest-dev/issues/3284 since we should ideally have more control over the API structure to enhance it with known issue content required by this flyout.

cc @pierrehilbert @cmacknz

cmacknz commented 5 months ago

The product versions API is something we already consume in Fleet, but it is kept up to date via the stack release process. I don't think you want to require a stack release, or an independent agent release, to update the known issues given we can discover them at any time.

I would rather us come up with a convention that sources this content from https://github.com/elastic/ingest-docs and the release notes https://www.elastic.co/guide/en/fleet/current/release-notes-8.14.0.html where we can update at any time with a PR. Perhaps we need a separate https://www.elastic.co/guide/en/fleet/current/known-issues-8.14.0.html to be created that is embedded into the release notes page as part of the docs build so it isn't duplicated. CC @kilfoyle

The complication with docs is you'll have to figure out how to embed it into Kibana at build time for air gapped users. Or perhaps we just don't do this there, since they have to separately fetch the binaries anyway. Maybe the solution for those use cases is to get this same known issues content displayed directly on the download page (e.g. https://www.elastic.co/downloads/elastic-agent) for each version, without the need to click through to the detailed release notes.

kilfoyle commented 5 months ago

Perhaps we need a separate https://www.elastic.co/guide/en/fleet/current/known-issues-8.14.0.html to be created that is embedded into the release notes page as part of the docs build so it isn't duplicated.

Certainly. I can look into embedding a separate HTML page into the Release Notes output. However, from a docs perspective, embedding a separate asciidoc source file would be a lot easier and likely more reliable. Would it be possible for the Kibana UI to grep the asciidoc source instead?

Here's a test PR that shows how all known issues for, say, 8.x could be stored in a single known-issues-8.x.asciidoc file. The text for each known issue could be grepped by tag (e.g. tag::known-issue-8.14.0-*) or by ID (e.g. known-issue-8.14.0-*).

kpollich commented 5 months ago

Would it be possible for the Kibana UI to grep the asciidoc source instead?

Greping the source is definitely possible, but rendering the asciidoc content to HTML will be more challenging. A JS port of asciidoctor (which I believe we use in other docs repos) exists: https://docs.asciidoctor.org/asciidoctor.js/latest/, and is MIT licensed so it's probably okay to pull this into Kibana with some guidance from the core team.

We'd definitely want to create an API endpoint to wrap all this behavior and avoid parsing asciidoc on the client side, as well.

kilfoyle commented 5 months ago

One other consideration for pulling docs content is that the source is planned (currently for December 2024) to migrate to MDX format, so indeed, pulling the HTML might be the most practical.

nchaulet commented 1 week ago

So looks like we will need a new API to provide the list of known issues that will be populated by the ingest-doc repo @kpollich do you know which team is responsible of the https://www.elastic.co/api/product_versions API? looks like that API is backed by content-stack, and I will probably have a few question for them on how we can add a new API.

Also a product question for you @nimarezainia @simosilvestri with the upgrade being in a flyout instead of a modal, we have a lot more room, did we consider showing the whole release note instead of just the known issues? it seems to me it could be interesting too for the user

kpollich commented 1 week ago

@elastic/webteam owns the product versions API + content-stack side of things if I remember correctly.

cmacknz commented 1 week ago

I am worried requiring a new API through elastic.co is over complicating this. The known issues are all in the release notes. They are publicly accessible. You can HTTP GET them now. https://raw.githubusercontent.com/elastic/ingest-docs/refs/heads/main/docs/en/ingest-management/release-notes/release-notes-8.15.asciidoc. You can also get the rendered HTML from a public URL https://www.elastic.co/guide/en/fleet/8.15/release-notes-8.15.4.html

The challenge is just parsing them. We have total control over the format we use for this. We can put whatever format we want in github or publish something new in https://www.elastic.co/guide/en/fleet/8.15 specifically for Kibana to consume.

The workflow for this should be:

PR to add known issue is merged into ingest-docs.
The elastic.co/guide/en/fleet pages update.
Kibana picks up the latest content and displays it in the UI. Just don't do this at all for air gapped users because it isn't possible and they are an edge case.

This does require us to design in detail how we want this to work, which hasn't happened yet in this issue.

nchaulet commented 1 week ago

@cmacknz I do not think scrapping the release note website is really a viable solution, this site and the way the doc is rendered can change, and scraping it seems not a future proof solution.

Using directly raw.githubusercontent could eventually be a solution instead of using a new API, is there no security issue rendering something from an host we do not control, instead of a controlled API? Is there any known limitation to use raw.githubusercontent as an API? it seems there is some rate limit of 5000 call per hours/ip probably not so problematic for us

I think a better workflow could be something like

With a new API

PR to add known issue is merged into ingest-docs.
Part of the build we push the change with rendered html to our new API probably in content-stack
Kibana pick up the latest content (if not air gapped) from a controlled API

With Github as a backend

PR to add known issue is merged into ingest-docs. (figure which format it's easy for us to consume in Kibana)
Kibana pick up the latest content (if not air gapped) from if it exists from github

cmacknz commented 1 week ago

I am not fundamentally opposed to an API, in practice it is better, I am just wondering if we can minimize the number of dependencies we have and things we need to build. If we can't, so be it.

Definitely the rate limit on raw.githubusercontent would be a problem, so that leaves us with finding a way to avoid building something new via some special page in the docs. If that is roughly equivalent to just pushing new content to a new API there's no point though.

jlind23 commented 1 week ago

Kibana pick up the latest content (if not air gapped) from a controlled API

What should be the frequency here? Shall this be pulled once a day and the content be stored in memory or in a saved object of some sort? Shall this be done every time the known issue page is loaded?

nchaulet commented 1 week ago

What should be the frequency here? Shall this be pulled once a day and the content be stored in memory or in a saved object of some sort? Shall this be done every time the known issue page is loaded?

it could be on demand when the know issue page is loaded (probably with a gracefull error handling to no block the whole upgrade flyout), similar to what we do when loading product version.

kpollich commented 1 week ago

It's hard to know if the rate limit on GitHub's raw endpoint is going to be problematic. Even with a 24h cache TTL on requests, if all requests from a given ESS region come from the same API we could easily hit a rate limit issue with that endpoint if we have several thousand clusters making requests to fetch this content. It seems safest to me to rely on an endpoint we control here.

One thing we could do to make scraping from docs.elastic.co more viable would be to place the known issues in some kind of predictable/structured piece of data on the docs page that can be scraped separately from the page content itself. That way we aren't depending on an HTML structure that could potentially change. e.g. putting the content into an HTML comment or a data attribute that we can parse out.

cmacknz commented 1 week ago

One thing we could do to make scraping from docs.elastic.co more viable would be to place the known issues in some kind of predictable/structured piece of data on the docs page that can be scraped separately from the page content itself. That way we aren't depending on an HTML structure that could potentially change. e.g. putting the content into an HTML comment or a data attribute that we can parse out.

+1 this is what I was originally trying to describe but not wording as well as a lighter way to get what we want. If this can't work that is fine, but let's prove the simpler path doesn't work before building something with more dependencies across the systems involved.

kpollich commented 1 week ago

Maybe it would even be possible to include a .json file that's "published" the docs site (not linked to, ofc) at a predictable URL location e.g. https://www.elastic.co/guide/en/fleet/current/release-notes-8.16.0/_known_issues.json that way we don't even really have to do scraping. Could we hack an "API endpoint" (just a static JSON file) into the docs system this way? Maybe @kilfoyle would know. In theory this should work exactly like hosting a .png or any other static asset.

nchaulet commented 1 week ago

+1 if we can publish a static json, it seems it will be a more viable solution than scrapping a website

jlind23 commented 1 week ago

+1 if we can publish a static json, it seems it will be a more viable solution than scrapping a website

Will the tech writer be in charge of maintaining this json? If yes they probably need to be informed about this.

cmacknz commented 1 week ago

The known issues have a structured and ideally parseable format, so ideally we would generate it as part of the docs build.

See https://github.com/elastic/ingest-docs/blob/41b514d4d415877df70fa6a65b15917c07568f5a/docs/en/ingest-management/release-notes/release-notes-8.15.asciidoc?plain=1#L53-L72 for example.

I write a good amount of the known issues myself, I would not want to have to write them twice, or even know that this implementation existed. Regardless of if the implementation is a static JSON file on the docs site or some new API, the person writing a known issue can't be required to do anything special, or it will just get forgotten.

kilfoyle commented 5 days ago

@KOTungseth, @benironside Just FYI, since I know you're working on a new docs-wide process for Release Notes that I believe includes how known issues are handled.

nchaulet commented 4 days ago

@kilfoyle @KOTungseth, @benironside do you have more information on the new doc wide process process for release notes? will it be possible to have the known issues source living in a predictable place for some automation to consume?

kilfoyle commented 4 days ago

do you have more information on the new doc wide process process for release notes? will it be possible to have the known issues source living in a predictable place for some automation to consume?

@nchaulet, we'll look into it and will reply back here soon.

KOTungseth commented 1 day ago

Have have not landed on a final decision for known issues, but here are my initial thoughts for 9.0.0 and later:

All known issues will live on a dedicated page with the following structure: Title Description Impact Workaround Status Links to related resources
When a known issue is fixed, it appears under the Bug fixes or simply Fixes section of the release notes

@nchaulet can you validate these assumptions for Fleet known issue automation:

Fleet known issues should live on their own dedicated docs page, separate from other Stack known issues (ie Logstash, Beats, etc)
Fleet known issues should be separated by Stack version
When a known issue is resolved, it will no longer appear on the Upgrade agent flyout

I should also note that the new docs system will use Markdown format, while our current system uses Asciidoc format. If we plan to automate Fleet known issues for the last 2-3 major versions, we'll need automation to account for both formats.

elastic / kibana