Closed pg-yang closed 1 year ago
Maybe , we should expand the meaning of Database
. Virtual Database
should contains cache instead of creating Virtual Cache
I think they are different, basically. Cache is much faster than database.
They are used for saving data , and contain statement too . We provide slow threshold config for various DB
agent-analyzer:
selector: ${SW_AGENT_ANALYZER:default}
default:
slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100,redis:20}
We just add redis:20
And we could append dbType
to service name like Mysql:192.168.1.1:3306
or Redis:192.168.1.1:6379
, or add column named dbType to UI
Yes, I don't think this is tech issue. I just talked from design perspective.
design perspective
It depends how you define Database . Design comes from Definition .
Let's say in this way, cache should be related to more metrics, such as cache hint is an important metric, database doesn't care about hint rate.
Hint ? means cache hit ?
Data store system can response a result , whether the result is empty or not .From client-server communication protocol perspective , they are all successful .
If we want to get hit rate , we should decode response , It's better if the response have hit-state . However, we cannot guarantee that every cache system protocol has hit-state
Finally , Hit
also is appropriate for Database (such as Mysql) .
You just described it from a general side, I mean for layer, and for apm, we should understand the service. In every official architecture, cache service and database service are two things. This is why we have cache and service span type from many years ago. You should not say they arevsame just because they have similar metric.
OK , I 'll follow current rule . For a system the concept should be consistent at least
And another thing is, that some caches are not remote services. The cache could be local memory.
Should we analysis local span for local memory cache such as guava-cache ?
Yes, we should, and we should provide a format rule to difference local and remote caches.
The new config look like this :
agent-analyzer:
#......
virtualService:
database:
slowAccessThreshold: ${SW_VIRTUAL_DB_SLOW_THRESHOLD:default:200,mongodb:100}
maxSlowStatementLength: ${SW_VIRTUAL_DB_SLOW_MAX_LENGTH:default:200,mongodb:100}
cache:
slowAccessThreshold: ${SW_VIRTUAL_CACHE_SLOW_THRESHOLD:default:80,redis:30}
maxSlowStatementLength: ${SW_VIRTUAL_CACHE_SLOW_MAX_LENGTH:default:200,redis:100}
enableLocal: false # This term define whether analysis local cache such as Guave-Cache ,Ecache
Do you have any suggestion?
Does the kernal support this kind of structure? I remember only 3 levels supported.
Yes , only 3 levels supported , I'll reorganize this config ,
agent-analyzer:
selector: ${SW_AGENT_ANALYZER:default}
default:
# .....
databaseSlowAccessThreshold: ${SW_VIRTUAL_DB_SLOW_THRESHOLD:default:200,mongodb:100}
databaseMaxSlowStatementLength: ${SW_VIRTUAL_DB_SLOW_MAX_LENGTH:default:200,mongodb:100}
cacheSlowAccessThreshold: ${SW_VIRTUAL_CACHE_SLOW_THRESHOLD:default:80,redis:30}
cacheMaxSlowStatementLength: ${SW_VIRTUAL_CACHE_SLOW_MAX_LENGTH:default:200,redis:100}
analysisLocalCache: false # This term define whether analysis local cache such as Guave-Cache ,Ecache
BTW , I want to modify the kernel for supporting more levels in future
BTW , I want to modify the kernel for supporting more levels in future
The reason we don't recommend more levels is that, nested YAML file is very hard to be controlled through system env variables, which are the key parts of how helm/operator(cloud-native) components control the OAP behaviors in the setup stage.
cacheMaxSlowStatementLength: ${SW_VIRTUAL_CACHE_SLOW_MAX_LENGTH:default:200,redis:100}
Is this necessary? What kinds of statements we could get? Could you provide an example? Such as in Redis case, if this statement doesn't include key
, then mostly, it is pointless.
And for cache, put
usually is much slower than read
. Threshold should be separated for read/write or general operation.
if this statement doesn't include
key
Yes, sometimes we couldn't get the key from cache plugin , we could show slow command in UI at least . It's better than noting . jedis-4.x -plugin :
- {key: db.statement, value: HSET a}
- {key: db.statement, value: SET key}
- {key: db.statement, value: XADD abc}
- {key: db.statement, value: XREAD} # only command , no key !!
Threshold should be separated for read/write or general operation.
Agree
Yes, sometimes we couldn't get the key from cache plugin , we could show slow command in UI at least . It's better than noting .
Not really. If we show meaningless information, we cost the resources(CPU, storage, network, a.k.a. user's money) for nothing. In your case key: db.statement, value: XREAD
should not be sampled as a slow statement, no matter how slow it is. Because in the APM use case, ppl(ops team) could be nothing to get rid of it. We have provided the latency(avg and percentile) to show the access speed.
Also, besides read and write slow access threshold, metrics should be separated too.
We can't distinguish between reading and writing unless we modify all cache plugin for uniform read/write flag Also , We can't know if the statement include key .
OK, then the value of these metrics would be lower than we could be.
Now , we could distinguish write/read operation . But we could't guarantee every span contains cache.key
and cache.op
.
Next question is which span should be analyzed ?
I think :
When the span contains op
, we generate response time , cpm ,sla , etc for write/read operation respectively
When the span contains key
, we generate TopN , we could't generate Write-TopN ,Read-TopN , as the op
maybe not exist . If we want Write-TopN ,Read-TopN , we should analysis the complete span (which contains op and key ) .Anyway it's meaningful that show the slow keys . Right ?
Slow read and write are meaningful. We should use op and key as the samplrd statement.
We could only analysis this when relative tags existing
make sense to me
I don't know how to test with UI , there is any doc ?
Do you have virtual cache layer in the backend? Once you added, you could use customer dashboard to set up dashboards for the new layer.
Docs are here. https://skywalking.apache.org/docs/main/next/en/ui/readme/
Also this PR is a good place to learn for new dashboard setup steps, https://github.com/apache/skywalking/pull/9432
I have already created following json config in backend
In UI
added to general.ts
:
{
path: "/cache",
name: "VirtualCache",
meta: {
title: "VirtualCache",
exact: true,
},
component: () =>
import(/* webpackChunkName: "layer" */ "@/views/Layer.vue"),
},
{
path: "/cache/tab/:activeTabIndex",
name: "VirtualCacheActiveTabIndex",
meta: {
notShow: true,
},
component: () =>
import(/* webpackChunkName: "layer" */ "@/views/Layer.vue"),
}
added to data.ts
export const RoutesMap: { [key: string]: string } = {
VirtualCache: "VIRTUAL_CACHE",
VirtualCacheActiveTabIndex: "VIRTUAL_CACHE"
}
But Please set a root dashboard for VIRTUAL_CACHE
when I visit page
The dashboard of virtual-cache-root
should be set as root at the dashboard management page. I am not sure how you wrote these JSONs, generally, they are exported from the dashboard management page too.
Because of browser cache. It works now.
Do you set the key as a or key on purpose? After we have this in the codebase, we need to update showcase. So, we need to set keys in the demo services clearly, as well as e2e
Is read/write success related to error status only?
Is read/write success related to error status only?
Yes
cache_write_sla = from(VirtualCacheWrite.*).percent(status == true);
cache_read_sla = from(VirtualCacheRead.*).percent(status == true);
Do you set the key as a or key on purpose? After we have this in the codebase, we need to update showcase. So, we need to set keys in the demo services clearly, as well as e2e
The showcase crash now , I would provide clearly test data after writing unit test .
Showcase could run locally on KinD and DockerCompose. demo is just another host.
Showcase could run locally on KinD and DockerCompose. demo is just another host.
Showcase means skywalking-showcase repo? And should I provide e2e-test? I have consulted some e2e-test in OAP . I think my e2e-test should integrate with agent , but I could't found similar test .May I misunderstand the e2e in OAP
e2e has cases with agents. What do you mean?
Showcase could be done after e2e. Because e2e is checking features, because we are confident to merge. Yes, showcase is the repo.
e2e has cases with agents. What do you mean?
I'm confused that I couldn't found java agent options in backend project . Now, I think I have found the correct entrance about e2e .
JAVA_TOOL_OPTIONS
is set to docker image ghcr.io/apache/skywalking-java/skywalking-java
e2e-service-consumer
, e2e-service-provider
to add redis test code e2e.yaml
(maybe need to extend base-compose.yml
) to trigger,verify my testYou may not need to change <3>, instead, you should adjust metrics verification and service list in existing e2e. Once <2> is done, the analysis would be done automatically.
You could use an in-memory cache, if a redis service is why you thought you should change base-compose.yml
➜ skywalking git:(feature/vitrual-cache) ✗ /Users/HOME/GolandProjects/skywalking-infra-e2e/bin/darwin/e2e run -c test/e2e-v2/cases/meter/e2e.yaml
INFO load the e2e config successfully
Pulling oap (skywalking/oap:latest)...
The image for the service you're trying to recreate has been removed. If you continue, volume data could be lost. Consider backing up your data before continuing.
Continue with the new image? [yN]pull access denied for skywalking/oap, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR Local Docker compose exited abnormally whilst running docker-compose: [--env-file /Users/HOME/ideaProjects/skywalking/test/e2e-v2/script/env up -d]. exit status 1
I couldn't find skywalking/oap
in docker hub . I don't understand where the image would download
It is not on docker hub, they are here, https://github.com/orgs/apache/packages?repo_name=skywalking
It looks like download image from docker hub . I just execute above shell . May I loss some steps
No , I mean the image is defined as skywalking/oap:latest
in https://github.com/apache/skywalking/blob/master/test/e2e-v2/script/docker-compose/base-compose.yml#L20
services:
oap:
image: skywalking/oap:latest
expose:
- 11800
- 12800
- 10051
- 5005
It should download from docker hub unless some shell (or setup) specify a docker repo . So I guess I loss some shell/setup .
Latest is your local build based on the latest code.
You could use an in-memory cache, if a redis service is why you thought you should change
base-compose.yml
in-memory cache (e.g. guava-cache, ehcache) plugin is optional . I would add redis docker image to base-compose.yml
.
It's OK ?
Search before asking
Description
Virtual Cache will analysis span which is cache-related . It will generate SLA, response time or some else metric In a word , it's similar to Virtual Database Reference virtual-database skywalking demo
Works should to be done :
Agent (focus on Java agent)
Span’s layer
,tag
if necessary .The known cache-related plugin(need suggestion : which plugin is cache-related):
Backend
Use case
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct