facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
MIT License
260 stars 38 forks source link

Kineto/dynolog IPC + kubernetes #241

Closed jon-chuang closed 2 weeks ago

jon-chuang commented 6 months ago

Hey guys awesome work.

Quick question about k8s compat:

If the kineto instrumented processes are running in K8s containers, are you guys aware if enabling host IPC, and running dynolog sidecar container is sufficient to facilitate the on-demand profiling use case?

Seems like running into k8s compat would be quite likely on production path for you guys so any details here would be really helpful.

Thanks!

briancoutinho commented 4 months ago

@jon-chuang sorry we missed this, if you can have the dynolog process run within the container and then have dynolog's port exported I think it should work.

There is an example docker file here that could be helpful https://pytorch.org/blog/automated-trace-collection/

rtanase commented 4 months ago

+1

briancoutinho commented 2 weeks ago

Let me know if this issue can be closed 👍