apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.75k stars 6.51k forks source link

jaeger and istio setup. #3447

Closed dbones closed 5 years ago

dbones commented 5 years ago

Please answer these questions before submitting your issue.


Question

Do you have an example of setting up Skywalking with jaeger (for tracing) and istio (for metrics)

If possible can you show using simple k8s deployments? i am new to helm (sorry)

my current progress:

I believe i have setup elasticsearch correctly from looking at the compose and helm setups.

I do not think i have understood the the Istio instructions, these seem to increase the count of endpoints on the main dash, as follows, however i cannot see any metrics:

image

finally, im not quite sure what to do with the jaeger agent.

my attempt sofar: https://gist.github.com/dbones/4d87efc1dfa43e2cad38cfd17f219f4f

reason for my interest

as an FYI, I am trying to prepare a demo on how to embrace CNCF (opentracing and istio) and get a great APM experience.

wu-sheng commented 5 years ago

Hi, from your screenshot, I think the most possible reason is, you haven't changed the timezone in the right bottom of page. In docker/k8s, the timezone is UTC-0 as default, but the UI will set your local timezone as default, so you only could see endpoint, no service, then no other metrics.

as an FYI, I am trying to prepare a demo on how to embrace CNCF (opentracing and istio) and get a great APM experience.

Interesting, where do you prepare to present?

dbones commented 5 years ago

I wanted to present this at an internal company summit (we have people for around the world attending)

Ah I see about the timezone, the UI is set to UTC +1 (I changed the viewable timespan to show data from yesturday) and left left skywalking enabled for a few hours

From what i can see, it has:

Here is a screenshot:

image

dbones commented 5 years ago

if it helps, the app i am running is a simple shop

image

I wanted to be able to use the following setup to show how observability requires Metrics, Tracing and Logging as follows:

image

wu-sheng commented 5 years ago

recognised that I have several services, but we have no data

I think you didn't set mixer right. The only thing reports traffic today is mixer itself. From the UI I saw. You could open debug log, and OAP log should be able to show which service metrics are sending. I believe, you set the OAP right, at least.

not picked up data from my separate jaeger instance (I do not think i have set this correctly)

We are using jaeger grpc service, so jaeger agent required. Do you deploy that and make it working?

has not recognised the RabbitMq, Postgres and Redis instances

That because Istio mixer reports thing in http and https only.

shows data for the istio tracing component, but no other dashboard is populated.

What is istio tracing component?

wu-sheng commented 5 years ago

Our .net core agent may could get more info and tracing. If you has interests, could check it, I am not sure does it have all plugins you required.

But maybe you just want observability in mesh solution, then you could pass the agent solution.

dbones commented 5 years ago

thanks for the quick responses :)

i have looked at the following:

We are using jaeger grpc service, so jaeger agent required.

replaced the jaeger collector with the agent, like the following:

image

this is not working, do i need to configure something with the OAP server?

"msg":"Could not create collector proxy","error":"could not create collector proxy, address is missing"

if so how do I configure this in the deployement yaml? (can i pass it in as a env var)

I think you didn't set mixer right. The only thing reports traffic today is mixer itself. From the UI I saw. You could open debug log, and OAP log should be able to show which service metrics are sending. I believe, you set the OAP right, at least.

I installed Istio via the helmchart, installed Elastic, OAP and the UI (v6.1) and then applied the Istio yaml files as mentioned above (from the docs)

i am getting an warn in the OAP:

graphql.execution.SimpleDataFetcherExceptionHandler -10192306 [qtp583015088-91] WARN [] - Exception while fetching data (/cpmC) : IDs can't be null

it is now showing the app conponents along with 2 istio components (is there anything I can do to filter out these)

image

this is what my skywalking namespace looks like:

image

note that the agent I have just added one of them, for now, and the collector has been turned off.

Our .net core agent may could get more info and tracing. If you has interests, could check it, I am not sure does it have all plugins you required.

correct, it looks like some plugins are missing

dbones commented 5 years ago

I noticed that i have the agent incorrectly, I have updated to set the opa:14250 via the args now.....

my latest exception from the Jaeger agent is:

{"level":"error","ts":1568135502.7668867,"caller":"grpc/reporter.go:70","msg":"Could not send spans over gRPC","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.43.28.245:14250: connect: connection refused\"","stacktrace":"github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc.(*Reporter).send\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc/reporter.go:70\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc.(*Reporter).EmitBatch\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc/reporter.go:50\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter.(*MetricsReporter).EmitBatch\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/metrics.go:77\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:138\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:112\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:115"}

the connection is refused, I am not sure if I am missing a setup somewhere

This is the latest setup I have for the SW deployment: https://gist.github.com/dbones/4d87efc1dfa43e2cad38cfd17f219f4f

wu-sheng commented 5 years ago

this is not working, do i need to configure something with the OAP server? "msg":"Could not create collector proxy","error":"could not create collector proxy, address is missing"

Ignore this, this is just a UI bug, query data when it should not do. There is nothing harm there. Have been removed in the latest release.

it is now showing the app conponents along with 2 istio components (is there anything I can do to filter out these)

That is based on what istio mixer sent. We don't support filter at OAP side.


From the screenshot, you should have the metrics and topology today, right?

For jaeger, do you open jaeger receiver?

#receiver_jaeger:
default:
  gRPCHost: ${SW_RECEIVER_JAEGER_HOST:0.0.0.0}
  gRPCPort: ${SW_RECEIVER_JAEGER_PORT:14250}

We used to have this as yaml mode, but I think it is missed in helm mode. https://github.com/apache/skywalking-kubernetes/blob/master/archive/6/6.0.0-GA/oap/01-config.yml#L23

wu-sheng commented 5 years ago

I found this in docker config, https://github.com/apache/skywalking-docker/tree/master/6/6.3/oap#xxx_enabled

Please enable SW_RECEIVER_JAEGER_ENABLED.

wu-sheng commented 5 years ago

By reading the doc, I submitted this, https://github.com/apache/skywalking/issues/3449. Jaeger receiver will have issues in that docker entry shell.

Please read the issue and documents I refer there, you may need to package a new docker image.

dbones commented 5 years ago

I have have done the following:

my last attempt yielded in the following error:

org.apache.skywalking.oap.server.starter.OAPServerStartUp -13300 [main] ERROR [] - metrics-name can't be null

if the team can do a hotfix it would be appreciated?

else if possible it would be great to do a video chat, ensure I have not made a silly error, and I can provide you remote access to my test platform (the platform i will have up for a little bit longer before i shut it down)

wu-sheng commented 5 years ago

I could have a video chat with you, but I am not a k8s fun :) I could guide you about how SkyWalking should work with its configuration.

dbones commented 5 years ago

epic, i have sent an email to the email address on your github profile.

wu-sheng commented 5 years ago

According to our video chat, all setup should be good for now.