elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.72k stars 8.14k forks source link

[APM] Idea: Alternative transaction navigation for RUM #26544

Open roncohen opened 5 years ago

roncohen commented 5 years ago

The RUM agent does not know about the abstract page patterns that the website it is installed on uses (/blog/:blogID). It only knows the concrete page path: /blog/10-tips-when-youre-building-your-own-airplane. Sorting in Elasticsearch https://github.com/elastic/kibana/pull/26443 will greatly improve the problems that we have with high cardinality due to concrete page names.

However, there are still cases where you could have a whole section of a website that should have a high impact, but does not show up in the top of the list because each page view is counted only once or very few times because the path names contains variables or similar. For example, it might be that you have a group of pages that is very slow /feed/:userID. Because the path contains a user ID, the transaction name will be /feed/42, /feed/43 etc. Because each user only looks at their feed a few times, it will be counted separately and it will never sum up to something significant compared to for example /blog/10-tips-when-youre-building-your-own-airplane which is a page many people will load. Another example is something like /my/search?q=every-search-is-a-snowflake

This has the effect that when the user logs in, they will not see single pages that have only been loaded once or very rarely with a high average response time because they drown out in the sea of pages that have names without IDs or parameters.

Details on why we can't get better transaction names We rely on an API call that web developers installing the agent must call on each page load to set the transaction name. We hoped that setting the default transaction name to "unknown" would make it obvious that developers need to make a conscious decision and an effort to figure out the abstract path names and use them in the API call. Setting the transaction name to the concrete path name would look correct in the UI by first glance so developers would just move on, thinking that it was installed correctly (we saw this in Opbeat). However, developers don't necessarily have a single place where the URL structure is defined that they can just pull in and pass on to the RUM agent. Additionally, lots of users only have sporadic access to the "master" template of their website. They might be using an external consultancy to develop it etc. So in the end, because that's the only thing that is convenient, web developers just resolve to setting the concrete page name as the transaction name.

Instead of trying to come up with better transaction names automatically or asking users to come up with complicated custom code to fix it, I suggest we change the navigation to be a path-hierarchy based navigation for RUM. This is similar to the "Content" navigation in Google Analytics. You see the top level paths first, and stats for every page that has url prefix:

1: image (numbers and order here are totally made up)

User then clicks "https://www.elastic.co/guide/en" and sees subpages for that with the stats for each:

2: image (again, numbers and order here are totally made up)

This should fix the problem of single/rate page urls not being counted/seen anywhere. It would also mean we can probably use page address as default in the RUM agent instead of asking the users to call the apm.setInitialPageLoadName(name). If this works as intended, it will make setting up RUM much much easier.

When using the hierarchy based navigation, we could also consider adding the option to go from the list to the transaction group details on a specific path prefix instead of the full URL. In other words, give the user the option between going a level deeper and going to page showing all the transaction that match the prefix:

image (again, a totally faked screenshot)

Path hierarchical querying

query for (1)

GET apm-6.4.0-transaction-2018.11.23-reindex/_search?size=0
{
  "query": {
    "match": {"context.page.url.hierarchical": "https://www.elastic.co/guide"}
  },
  "aggs": {
    "txs": {
      "terms": {
         "script" : {
            "source": "def d = doc['context.page.url'].value; if (d.length() > 29) { def c = d.indexOf('/', 29); if (c>0) { return d.substring(0,c);}} return d",
            "lang": "painless",
          "order": {
            "duration_sum": "desc"
          }
        }
      },
      "aggs" : {
        "duration_avg" : { "avg": { "field" : "transaction.duration.us" } },
        "duration_sum" : { "sum": { "field" : "transaction.duration.us" } },
        "duration_p99" : { "percentiles": { "field" : "transaction.duration.us", "percents" : [99] } }
      }
    }
  }
}

query for (2)

GET apm-6.4.0-transaction-2018.11.23-reindex/_search?size=0
{
  "query": {
    "match": {"context.page.url.hierarchical": "https://www.elastic.co/guide/en"}
  },
  "aggs": {
    "txs": {
      "terms": {
         "script" : {
            "source": "def d = doc['context.page.url'].value; if (d.length() > 32) { def c = d.indexOf('/', 32); if (c>0) { return d.substring(0,c);}} return d",
            "lang": "painless",
          "order": {
            "duration_sum": "desc"
          }
        }
      },
      "aggs" : {
        "duration_avg" : { "avg": { "field" : "transaction.duration.us" } },
        "duration_sum" : { "sum": { "field" : "transaction.duration.us" } },
        "duration_p99" : { "percentiles": { "field" : "transaction.duration.us", "percents" : [99] } }
      }
    }
  }
}

These queries work by relying on the path hierarchical analyzer for context.page.url:

PUT apm-6.4.0-transaction-2018.11.23-reindex
{
  "settings": {
    "analysis": {
      "filter": {
        "url_stop": { 
          "type": "stop"
        }
      },
      "analyzer": {
        "page_hierarchy_analyzer": {
          "tokenizer": "path_hierarchy"
        }
      }
    }
  }
}

PUT apm-6.4.0-transaction-2018.11.23-reindex/doc/_mapping
{
  "properties": {
    "context.page.url": {
      "type": "keyword", 
      "fields": {
        "hierarchical": {
          "type": "text",
          "analyzer": "page_hierarchy_analyzer",
          "search_analyzer": "keyword"
        }
      }
    }
  }
}

Optimizations

We can avoid the performance hit from script based terms aggregation by trading for an increased index size. To avoid the script based term aggregation, we would instead create fields for the first 3-4 levels and store them in the index. That would allow us to avoid the scripted aggregation on the first 3-4 levels where the amount of data is the largest, and only use the scripted aggregation for levels that are deeper than those, where the amount of data that we need to aggregate over is significantly less.

Example:

{
  "context.page.url": "https://www.elastic.co",
  "context.page.url.level1": https://www.elastic.co",
  "context.page.url.level2": https://www.elastic.co/guide",
  "context.page.url.level3": https://www.elastic.co/guide/en"
}
Ingest pipeline to achieve this -------------- This rudimentary ingest pipeline will parse the first levels. We could also imagine doing it in APM Server instead. ``` PUT _ingest/pipeline/levels { "description": "parse levels", "processors": [ { "script": { "source": """ def s = ctx['context.page.url']; def i1 = s.indexOf('/', 8); def i2 = s.indexOf('/', i1+1); ctx['context.page.url-levels.level1']= s.substring(0, i1); ctx['context.page.url-levels.level2'] = s.substring(0, i2); ctx['context.page.url-levels.level3'] = s.substring(0, s.indexOf('/', i2+1)); """ } } ] } ``` Note: this also needs a separate mapping update --------------

For level 4 and up, we'd resort back to the scripted aggregation. This would be trading index size for speedier queries.

This query would show all sub paths to https://www.elastic.co/guide and group by the third level: https://www.elastic.co/guide/*

GET apm-6.4.0-transaction-2018.11.23-reindex/_search?size=0
{
  "query": {
    "match": {"context.page.url.hierarchical": "https://www.elastic.co/guide"}
  },
  "aggs": {
    "txs": {
      "terms": {
          "field": "context.page.url.level3",
          "order": {
            "duration_sum": "desc"
          }
        }
      },
      "aggs" : {
        "duration_avg" : { "avg": { "field" : "transaction.duration.us" } },
        "duration_sum" : { "sum": { "field" : "transaction.duration.us" } },
        "duration_p99" : { "percentiles": { "field" : "transaction.duration.us", "percents" : [99] } }
      }
    }
  }
}

It's possibly that there's an even better way to do the querying. We should investigate that if we chose to do this.

elasticmachine commented 5 years ago

Pinging @elastic/apm-ui

roncohen commented 5 years ago

in addition to APM UI team, it would be great to get your input @makwarth @jahtalab

makwarth commented 5 years ago

Nice @roncohen. I’m ++ on this. It’s a bummer to not be able to keep the generic design across languages, but it was bound to end some day. Some of the geo / user agent stuff that I’ve mentioned earlier, would also require a unique UI element(s) for RUM.

It’d be nice if we can avoid groupings like so:

http://google.com http://www.google.com http://www.google.com/ https://www.google.com

Not sure how we'll solve the IA. Looks like most of the filtering can happen in the list view?

roncohen commented 5 years ago

thanks!

http://www.google.com vs. http://www.google.com/

shouldn't be a problem, but as for the rest we could massage the page url to collapse those, but i could also see some users asking us to show them as individual items. For example, you might want to see the load times on https:// vs. http://. Could get tricky. We could potentially add a config option in the pipeline or APM Server if that’s what we end up going with which would massage the url to remove prefixes like http://www., https://www, http://, https:// etc. or only operated on the /path segment of the url.

That means it would happen at ingest time instead of query time and you would not be able to change the setting for old data.

In theory, we could do both, e.g. keep the original url and add another set of fields that would contain the massaged url. You could then switch between using the two sets of fields (original full length and the massaged) in the UI. But that's definitely something i'd defer to a later iteration.

hmdhk commented 5 years ago

Thanks @roncohen ! I like the idea!

The could be an issue with having too many hierarchies! For example some websites have a common prefix for all pages, this would result in top level hierarchies have only one child.

One solution to that is if the UI would just group all parents with one child together and show the deepest child with more than one child!

Another point is if there's a way to make this a configuration on the Kibana side? I would prefer that since the user can just change this configuration if they have a needs instead of having to change their ingest configuration!

sorenlouv commented 5 years ago

Sounds super useful for RUM.

Opt in/out I imagine this should be enabled by default for RUM data. Should it also be possible to opt out if the user is not interested in this? Should it be possible for users to enable this behaviour for other agents?

Complexity This is definitely more complex than the solution we use today. Considering the value it'll likely add for RUM users I think it's still worth it. We should just plan accordingly.

One solution to that is if the UI would just group all parents with one child together and show the deepest child with more than one child!

Good point @jahtalab. That would be a good enhancement.

sorenlouv commented 5 years ago

Another point is if there's a way to make this a configuration on the Kibana side?

If we decide that it should be configurable (perhaps even per agent) we can do this via kibana.yml.

alvarolobato commented 5 years ago

@elastic/apm-ui @roncohen @makwarth what's the situation on this discussion? can we try to wrap it up and create an implementation issue?

sorenlouv commented 5 years ago

This would be great to have, but also sounds like a huge effort. How requested is something like this, now that we have made improvements to the issue that originally spurred this?

formgeist commented 5 years ago

I forgot to link it here, but I've created a design document for the UI enhancements for RUM (public doc)

formgeist commented 5 years ago

Thought I'd post the GIF from the design document in here too;

Kapture 2019-06-03 at 13 43 50

@makwarth Interested in your thoughts on this breadcrumb navigation concept.

makwarth commented 5 years ago

Hm, how would this work for e.g. a SaaS website where the URLs contain account names, like www.domain.com/orgname/content - or a time indexed site, like NYT https://www.nytimes.com/2019/06/02/us/politics/elaine-chao-china.html? The latter might be an edge case in terms of target audience, but still relevant to discuss.

I wonder if search is a better form of navigation for RUM transactions?

And, if users don't want to use search, they can use custom transaction grouping to do something like domain.com/:org:/content, can't they? (Might be too complicated, I'm not sure)

roncohen commented 5 years ago

I think it wasn't clear from my proposal that you need to be able to see the details page for a prefix, not just for a single page. E.g. you shouldn't need to go all the way "in" to get the details page.

The point here is that when you look at a row in the navigation table, that row covers every url that has the prefix on the row, e.g. when you click https://www.nytimes.com/2019/ the details page will cover all articles published in 2019. If you click "https://www.nytimes.com/section/" it will cover all the "section" pages.

Having the generic urls would be ideal. Our experience is that no one actually sets that up because it's difficult. Using the page title is another option and that would be simple for the agent to use as default, but the page title will also often include something specific.

You can still use search in the search box and set generic URLs if you wish btw

makwarth commented 5 years ago

Got it. I definitely see it's valuable, but I think search is worth considering in addition (and probably first). We currently show 25 transaction groups, with no pagination, so you could argue that we need this feature across all languages already.

The search filter could be an input field that's within the transaction groups element. When the user submits a filter, e.g. en/2019, that filter is applied to the global search, e.g. transaction.name : *en/2019*

roncohen commented 5 years ago

The search filter could be an input field that's within the transaction groups element. When the user submits a filter, e.g. en/2019, that filter is applied to the global search, e.g. transaction.name : en/2019

Search is great, but if we start with search as you proposed, when the users submits a search, the individual rows will still correspond to individual pages. That's the main problem with RUM data today. There's no useful summary of which areas of your website is slow because there's no grouping. What we're trying to solve with this proposal is a way to group things when there's no way to get a grouping from the framework/routes.

makwarth commented 5 years ago

Agree, I guess there's really two issues: Grouping and navigation. Are you worried about the group aggregate value in use cases like the SaaS use case with /:org:/ - or with /2019/? (One org might be super slow but rest are fast)?

roncohen commented 5 years ago

A bit worried, but i think the proposed navigation will still be a significant improvement over what people experience today. I think it's interesting to think about how we can incorporate search here, but i think we could do it separately.

formgeist commented 5 years ago

As for the navigation concept, I've updated it according to Ron's comment about being able to enter detail page for any prefix in the URL

Original comment

I think it wasn't clear from my proposal that you need to be able to see the details page for a prefix, not just for a single page. E.g. you shouldn't need to go all the way "in" to get the details page.

Marvel prototype

Kapture 2019-06-07 at 15 18 01

I can add the search option in there too, but I imagine that's a considered a search option in the table itself.

Thoughts?

roncohen commented 5 years ago

the nested one is interesting, but i worry there might be millions of subpages. For the navigation to work, maybe we need to step back and rethink how the graphs and the navigation connect? Does it make sense to update the graphs every time you click "into" something? e.g. you click /guide/ and the graphs update to show you all of "guide" and the list updates to show you sub pages?

That would perhaps require the graphs to move around so they are below the list or on the side? Just brainstorming here.

vigneshshanmugam commented 5 years ago

Regarding the navigation grouping of the new proposed design, Will it be even possible for the user to view the aggregated graphs for /guide/elasticsearch across all locales? . Having the ability to go deeper across the locale seems good. But it would be better if the UI allows some form of grouping(group /guide/elasticsearch across all locale) vs having it per sublevel by default.

formgeist commented 5 years ago

@vigneshshanmugam I think that was what @roncohen was getting at in his feedback above about being able to render a full transaction detail page per transaction "group" with a list of children. I'll have some time soon to explore other designs so stay tuned.

alvarolobato commented 4 years ago

@roncohen are we still doing this or will we go to a completely different UI for RUM first?

cc @nehaduggal @Tanya-Bragin

roncohen commented 4 years ago

I don't remember seeing much discussion on a completely different UI. Sounds like we should set up a call to discuss

formgeist commented 4 years ago

I removed the 7.6 version label from this issue. Still need to figure out the prioritization of this UI.

hmdhk commented 4 years ago

Another example use-case for improving the grouping of transactions: https://discuss.elastic.co/t/elastic-apm-rum-js-agent-grouping-http-request-transactions/215380.

We should have a zoom call to discuss the potential solution we can take with this one. cc @lreuven @drewpost

botelastic[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.