deepflowio / deepflow

eBPF Observability - Distributed Tracing and Profiling
https://deepflow.io
Apache License 2.0
2.85k stars 314 forks source link

jaeger导入deepflow后,发现缺少部分服务的链路 #7617

Open liu1004010308 opened 2 months ago

liu1004010308 commented 2 months ago

Search before asking

DeepFlow Component

Agent

What you expected to happen

根据官网[https://www.deepflow.io/docs/zh/integration/input/tracing/opentelemetry/ ] 导入jaeger数据后,发现缺少部分服务的链路 image

采集到的weboffice-api**不包含otel数据: image

How to reproduce

1、采用以下otel-agent配置,将jaeger数据导入到deepflow中; 2、服务-->otel-->jaeger,服务显示正常,服务--->otel-->deepflow-agent后,缺少部分服务;我仅仅只是修改exporters: [ jaeger]成了exporters: [ otlphttp ] 3、所以我怀疑不是otel的问题;进一步我不知道如何定位了,deepflow-agent开启debug也没有有用日志。

apiVersion: v1
data:
otel-collector.yml: |
receivers:
jaeger:
protocols:
# listens on :14250
grpc:
thrift_compact:
thrift_binary:
processors:
k8sattributes:
resource:
attributes:
- key: app.host.ip
from_attribute: k8s.pod.ip
action: insert

  tail_sampling:
    # 在做出采样决定之前,从 trace 的第一个跨度开始的等待时间
    decision_wait: 30s
    # 内存中保存的 trace 数
    num_traces: 50000
    # 预期的新 trace 数
    expected_new_traces_per_sec: 0
    policies:
      [
        {
          name: probattr_policy,
          type: probabilistic,
            probabilistic: { sampling_percentage: 1 }
        },
        {
          name: status_policy,
          type: status_code,
          status_code: { status_codes: ["ERROR"] }
        },
        {
          name: latency_policy,
          type: latency,
          latency: { threshold_ms: 1500 }
        },
        {
          name: slow-boolattr-policy,
          type: boolean_attribute,
          boolean_attribute: { key: "slow", value: true }
        },
        {
          name: scrape-boolattr-policy,
          type: boolean_attribute,
          boolean_attribute: { key: "scrape", value: true }
        }
      ]

extensions:
  health_check:
  pprof:
  zpages:

exporters:
  otlphttp:
    traces_endpoint: 'http://deepflow-agent.deepflow/api/v1/otel/trace'
    tls:
      insecure: true
    retry_on_failure:
      enabled: true
    jaeger:
        endpoint: ip:14250
        tls:
          insecure: true

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    traces:
      receivers: [ jaeger ]
      processors: [ k8sattributes, resource, tail_sampling ]
      exporters: [ otlphttp ] 

DeepFlow version

deepflow-agent:v6.5

DeepFlow agent list

kubewps> deepflow-ctl agent-group-config list -o yaml vtap_group_id: g-a2a719b149 http_log_trace_id: uber-trace-id ntp_enabled: 1 static_config: ntp-max-interval: 30s ntp-min-interval: 1s

Kubernetes CNI

flannel

Operation-System/Kernel version

4.4.0-142-generic

Anything else

No response

Are you willing to submit a PR?

Code of Conduct

liu1004010308 commented 2 months ago

数据库里有数据: 85c4eb0a2bacaec9d5e62a2ac4101be 去掉这两项后: 9a0f54e9facd3c593a1945894fa3276 数据显示出来了 image 进一步发现: image 可能这里搜索不到有关系,有什么解决方法吗?

yujianweilai commented 1 month ago

数据库里有数据: 85c4eb0a2bacaec9d5e62a2ac4101be 去掉这两项后: 9a0f54e9facd3c593a1945894fa3276 数据显示出来了 image 进一步发现: image 可能这里搜索不到有关系,有什么解决方法吗?

你好,朋友: 我也遇到了和你类似的问题,直观的现象是:20几个微服务通过otel-》deepflow-agent ,最后通过“distributed Tracing”查询时,之后三个微服务查到了otel源上来的数据,用你的sql 我去ckickhouse查询,数据已经存入了clickhouse。然后我重启业务的各个微服务,发现每次能查到的服务还不一样。。。 不知道你遇到的这个问题,已经解决了吗? 求经验分享,非常感谢。

yujianweilai commented 1 month ago

看到有人“接单”了,非常开心,如果需要,我可以配合进行相关验证。 @1473371932