deepflowio / deepflow

eBPF Observability - Distributed Tracing and Profiling
https://deepflow.io
Apache License 2.0
2.78k stars 307 forks source link

[FR] Support filtering request_log by specifying a blacklist of request_resource, endpoint, or other fields #5916

Closed chrisdamon closed 1 month ago

chrisdamon commented 5 months ago

Search before asking

Description

在agent采集的数据中存在很多probe健康检查的探针(/health)、prometheus的监控(/metrics)这种接口,数据量很多但实际作用不大。 建议:可以在采集端agent直接判断http协议,/health、/metrics接口的数据过滤配置,从而降低数据存储的成本以及减轻服务端的压力。

Use case

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

sharang commented 5 months ago

实际上这部分数据消耗的存储资源非常少。

我们将会在未来版本中支持此能力,也欢迎贡献 PR。

novohool commented 5 months ago

实际上这部分数据消耗的存储资源非常少。

我们将会在未来版本中支持此能力,也欢迎贡献 PR。

目前是自己清这些数据,它这个不仅只是占用资源的问题,这个会在界面上展示很多不关心的数据,看了下目前只支持自定义协议。

sql=$(cat <<EOF
ALTER TABLE flow_log.l7_flow_log_local DELETE 
WHERE l7_protocol_str='NATS'
OR l7_protocol_str='Redis'
OR endpoint='/liveness'
OR endpoint='/metrics'
OR endpoint='/readiness'
OR endpoint='/actuator/prometheus'
OR endpoint='/ping/health'
OR endpoint='/actuator/health'
OR endpoint='/-/health'
OR endpoint='/nacos/v1'
OR endpoint='/healthz/ready' 
OR process_kname_1='filebeat'
OR process_kname_0='prometheus'
OR process_kname_0='filebeat'
OR endpoint='/-/ready'
OR request_type='SENTINEL'
OR request_domain='zipkin'
OR request_domain='elasticsearch-master:9200'
OR request_domain='skywalking-oap.istio-system:11800'
OR request_domain='prometheus-operated.monitoring.svc:9090'
OR response_result='ns.dns.cluster.local'
OR response_result='a.root-servers.net'
OR response_result='10.IN-ADDR.ARPA'
OR response_result='10.in-addr.arpa'
OR response_result='prisoner.iana.org'
;
ALTER TABLE flow_log.l7_flow_log_local MODIFY TTL time + toIntervalHour(24);
ALTER TABLE flow_log.l4_flow_log_local MODIFY TTL time + toIntervalHour(24);
ALTER TABLE profile.in_process_local MODIFY TTL time + toIntervalHour(24);
EOF
)
sharang commented 5 months ago

@novohool

  1. NATS 和 Redis:我们支持给 Agent 配置应用协议解析的开关,请参考 https://github.com/deepflowio/deepflow/blob/main/server/controller/model/agent_group_config_example.yaml#L1001

    ## List of Application Protocols
    ## Note: Turning off some protocol identification can reduce deepflow-agent resource consumption.
    #l7-protocol-enabled:
    #- HTTP
    #- HTTP2 ## for both HTTP2 and gRPC
    #- SofaRPC
    #- FastCGI
    #- bRPC
    #- Dubbo
    #- MySQL
    #- PostgreSQL
    #- Redis
    #- MongoDB
    #- Kafka
    #- MQTT
    #- AMQP
    #- OpenWire
    #- NATS
    #- Pulsar
    #- ZMTP
    #- DNS
    #- TLS
    #- Custom ## custom protocol from plugin
  2. endpoint、request_type、request_domain、response_result:我们会在未来版本中支持过滤

  3. process_kname:我们会在未来版本中支持过滤,eBPF 数据好过滤。但 Packet 数据不好过滤,因为不知道包是哪个进程的

novohool commented 5 months ago

@novohool

  1. NATS 和 Redis:我们支持给 Agent 配置应用协议解析的开关,请参考 https://github.com/deepflowio/deepflow/blob/main/server/controller/model/agent_group_config_example.yaml#L1001
  ## List of Application Protocols
  ## Note: Turning off some protocol identification can reduce deepflow-agent resource consumption.
  #l7-protocol-enabled:
  #- HTTP
  #- HTTP2 ## for both HTTP2 and gRPC
  #- SofaRPC
  #- FastCGI
  #- bRPC
  #- Dubbo
  #- MySQL
  #- PostgreSQL
  #- Redis
  #- MongoDB
  #- Kafka
  #- MQTT
  #- AMQP
  #- OpenWire
  #- NATS
  #- Pulsar
  #- ZMTP
  #- DNS
  #- TLS
  #- Custom ## custom protocol from plugin
  1. endpoint、request_type、request_domain、response_result:我们会在未来版本中支持过滤
  2. process_kname:我们会在未来版本中支持过滤,eBPF 数据好过滤。但 Packet 数据不好过滤,因为不知道包是哪个进程的

感谢,这个确实比较麻烦,七层的包是需要先解包吧,如果agent和server都有过滤,agent过滤三层的包,server过滤七层的包,并且支持正则处理,比如用户可以设置对于dns解析.*.svc.cluster.local这类的内部域名进行丢弃处理,就很方便了。

sharang commented 3 months ago

You can now use the configuration below to discard unwanted request_logs:

  ## l7_flow_log Blacklist
  ## Example:
  ##   l7-log-blacklist:
  ##     HTTP:
  ##     - field_name: request_resource  # endpoint, request_type, request_domain, request_resource
  ##       operator: equal  # equal, prefix
  ##       value: somevalue
  ## Note: A l7_flow_log blacklist can be configured for each protocol, preventing request logs matching
  ##   the blacklist from being collected by the agent or included in application performance metrics.
  ##   It's recommended to only place non-business request logs like heartbeats or health checks in this
  ##   blacklist. Including business request logs might lead to breaks in the distributed tracing tree.
  #l7-log-blacklist:
  #  HTTP: []
  #  HTTP2: []
  #  ...

and you can also use the configuration below to discard unwanted DNS request_logs:

    ## Unconcerned DNS NXDOMAIN Responses
    ## Note: You might not be concerned about certain DNS NXDOMAIN errors and may wish to ignore
    ##   them. For example, when a K8s Pod tries to resolve an external domain name, it first
    ##   concatenates it with the internal domain suffix of the cluster and attempts to resolve
    ##   it. All these attempts will receive an NXDOMAIN reply before it finally requests the
    ##   original domain name directly, and these errors may not be of concern to you. In such
    ##   cases, you can configure their `response_result` suffix here, so that the corresponding
    ##   `response_status` in the l7_flow_log is forcibly set to `Success`.
    #unconcerned-dns-nxdomain-response-suffixes: []

Additionally, there is currently no support for discarding request_logs by matching process_kname_0/1. This is primarily because request_logs collected via AF_PACKET/cBPF do not have this field, which would result in incomplete discards. In the future, we will consider finding an appropriate method to directly support discarding request_logs by configuring the process name or pod name.

novohool commented 3 months ago

You can now use the configuration below to discard unwanted request_logs:

  ## l7_flow_log Blacklist
  ## Example:
  ##   l7-log-blacklist:
  ##     HTTP:
  ##     - field_name: request_resource  # endpoint, request_type, request_domain, request_resource
  ##       operator: equal  # equal, prefix
  ##       value: somevalue
  ## Note: A l7_flow_log blacklist can be configured for each protocol, preventing request logs matching
  ##   the blacklist from being collected by the agent or included in application performance metrics.
  ##   It's recommended to only place non-business request logs like heartbeats or health checks in this
  ##   blacklist. Including business request logs might lead to breaks in the distributed tracing tree.
  #l7-log-blacklist:
  #  HTTP: []
  #  HTTP2: []
  #  ...

and you can also use the configuration below to discard unwanted DNS request_logs:

    ## Unconcerned DNS NXDOMAIN Responses
    ## Note: You might not be concerned about certain DNS NXDOMAIN errors and may wish to ignore
    ##   them. For example, when a K8s Pod tries to resolve an external domain name, it first
    ##   concatenates it with the internal domain suffix of the cluster and attempts to resolve
    ##   it. All these attempts will receive an NXDOMAIN reply before it finally requests the
    ##   original domain name directly, and these errors may not be of concern to you. In such
    ##   cases, you can configure their `response_result` suffix here, so that the corresponding
    ##   `response_status` in the l7_flow_log is forcibly set to `Success`.
    #unconcerned-dns-nxdomain-response-suffixes: []

Additionally, there is currently no support for discarding request_logs by matching process_kname_0/1. This is primarily because request_logs collected via AF_PACKET/cBPF do not have this field, which would result in incomplete discards. In the future, we will consider finding an appropriate method to directly support discarding request_logs by configuring the process name or pod name.

Is there something wrong with my configuration? I looked at the dashboard and the data was not filtered out. My configuration is as follows

version 6.5.7

$ cat agent-group-config.yaml 
agent_group_id: g-21b735b71d
http_log_trace_id: traceparent,sw8,x-b3-traceid,x-b3-flags,b3
http_log_span_id: traceparent, sw8,x-b3-spanid,x-b3-parentspanid
http_log_x_request_id: X-Request-ID
http_log_proxy_client: X-Forwarded-For
static_config:
  l7-protocol-enabled:
  - HTTP 
  - HTTP2 
  - SofaRPC
  - FastCGI
  - MySQL
  - PostgreSQL
  - MongoDB
  - Kafka
  - DNS
  - TLS
  - AMQP
l7-log-blacklist:
  HTTP:
  - field-name: endpoint
    operator: prefix
    value: /healthz
  - field-name: endpoint
    operator: prefix
    value: /liveness
  - field-name: endpoint
    operator: prefix
    value: /readiness
  - field-name: endpoint
    operator: prefix
    value: /metrics     
  - field-name: endpoint
    operator: prefix
    value: /nacos
  - field-name: endpoint
    operator: prefix
    value: /ping
  - field-name: endpoint
    operator: prefix
    value: /actuator/prometheus
  - field-name: request_domain
    operator: equal
    value: zipkin
  - field-name: request_domain
    operator: equal
    value: elasticsearch-master:9200
  - field-name: request_domain
    operator: equal
    value: skywalking-oap.istio-system:11800
  - field-name: request_domain
    operator: equal
    value: prometheus-operated.monitoring.svc:9090
  - field-name: response_result
    operator: equal
    value: ns.dns.cluster.local
  - field-name: response_result
    operator: equal
    value: a.root-servers.net
  - field-name: response_result
    operator: equal
    value: 10.IN-ADDR.ARPA
  - field-name: response_result
    operator: equal
    value: 10.in-addr.arpa
  - field-name: response_result
    operator: equal
    value: prisoner.iana.org
1473371932 commented 1 month ago

Yes, currently yourl7-log-blacklistparameter is used in the wrong position, it should be under the static_config, you can refer to Configuration Exampleto update the location of this parameter, in addition, v6.5 We already have the LTS version, the image tag can be upgraded by updating it to v6.5

You can now use the configuration below to discard unwanted request_logs:

  ## l7_flow_log Blacklist
  ## Example:
  ##   l7-log-blacklist:
  ##     HTTP:
  ##     - field_name: request_resource  # endpoint, request_type, request_domain, request_resource
  ##       operator: equal  # equal, prefix
  ##       value: somevalue
  ## Note: A l7_flow_log blacklist can be configured for each protocol, preventing request logs matching
  ##   the blacklist from being collected by the agent or included in application performance metrics.
  ##   It's recommended to only place non-business request logs like heartbeats or health checks in this
  ##   blacklist. Including business request logs might lead to breaks in the distributed tracing tree.
  #l7-log-blacklist:
  #  HTTP: []
  #  HTTP2: []
  #  ...

and you can also use the configuration below to discard unwanted DNS request_logs:

    ## Unconcerned DNS NXDOMAIN Responses
    ## Note: You might not be concerned about certain DNS NXDOMAIN errors and may wish to ignore
    ##   them. For example, when a K8s Pod tries to resolve an external domain name, it first
    ##   concatenates it with the internal domain suffix of the cluster and attempts to resolve
    ##   it. All these attempts will receive an NXDOMAIN reply before it finally requests the
    ##   original domain name directly, and these errors may not be of concern to you. In such
    ##   cases, you can configure their `response_result` suffix here, so that the corresponding
    ##   `response_status` in the l7_flow_log is forcibly set to `Success`.
    #unconcerned-dns-nxdomain-response-suffixes: []

Additionally, there is currently no support for discarding request_logs by matching process_kname_0/1. This is primarily because request_logs collected via AF_PACKET/cBPF do not have this field, which would result in incomplete discards. In the future, we will consider finding an appropriate method to directly support discarding request_logs by configuring the process name or pod name.

Is there something wrong with my configuration? I looked at the dashboard and the data was not filtered out. My configuration is as follows

version 6.5.7

$ cat agent-group-config.yaml 
agent_group_id: g-21b735b71d
http_log_trace_id: traceparent,sw8,x-b3-traceid,x-b3-flags,b3
http_log_span_id: traceparent, sw8,x-b3-spanid,x-b3-parentspanid
http_log_x_request_id: X-Request-ID
http_log_proxy_client: X-Forwarded-For
static_config:
  l7-protocol-enabled:
  - HTTP 
  - HTTP2 
  - SofaRPC
  - FastCGI
  - MySQL
  - PostgreSQL
  - MongoDB
  - Kafka
  - DNS
  - TLS
  - AMQP
l7-log-blacklist:
  HTTP:
  - field-name: endpoint
    operator: prefix
    value: /healthz
  - field-name: endpoint
    operator: prefix
    value: /liveness
  - field-name: endpoint
    operator: prefix
    value: /readiness
  - field-name: endpoint
    operator: prefix
    value: /metrics     
  - field-name: endpoint
    operator: prefix
    value: /nacos
  - field-name: endpoint
    operator: prefix
    value: /ping
  - field-name: endpoint
    operator: prefix
    value: /actuator/prometheus
  - field-name: request_domain
    operator: equal
    value: zipkin
  - field-name: request_domain
    operator: equal
    value: elasticsearch-master:9200
  - field-name: request_domain
    operator: equal
    value: skywalking-oap.istio-system:11800
  - field-name: request_domain
    operator: equal
    value: prometheus-operated.monitoring.svc:9090
  - field-name: response_result
    operator: equal
    value: ns.dns.cluster.local
  - field-name: response_result
    operator: equal
    value: a.root-servers.net
  - field-name: response_result
    operator: equal
    value: 10.IN-ADDR.ARPA
  - field-name: response_result
    operator: equal
    value: 10.in-addr.arpa
  - field-name: response_result
    operator: equal
    value: prisoner.iana.org