cilium / design-cfps

Repo to store Cilium CFP design docs
Apache License 2.0
31 stars 26 forks source link

CFP 33716: Packet Tracing #44

Closed Bigdelle closed 3 months ago

Bigdelle commented 4 months ago

Add PR for CFP: Packet Tracing.

gandro commented 4 months ago

cc @cilium/sig-hubble

Bigdelle commented 4 months ago

cc @cilium/sig-datapath

julianwiedmann commented 4 months ago

On a very high-level - you're proposing that the datapath matches some specific IP option in the packet (containing some trace_id), and transfers the trace_id value to userspace via the trace_notify struct ?

As an alternative thought - afaik the datapath is already punting some(?) packet headers to userspace. If you increase the number of punted bytes (so that it includes the IP options), would that allow you to move all the needed IP option-parsing into hubble?

Bigdelle commented 4 months ago

On a very high-level - you're proposing that the datapath matches some specific IP option in the packet (containing some trace_id), and transfers the trace_id value to userspace via the trace_notify struct ?

As an alternative thought - afaik the datapath is already punting some(?) packet headers to userspace. If you increase the number of punted bytes (so that it includes the IP options), would that allow you to move all the needed IP option-parsing into hubble?

We want the ip_trace_id available at the Kernel level. This PR sets up the infrastructure to expand on this feature as we continue developing it; the next milestone (linked in the Future Milestones section of the CFP) involves restoring the ip_trace_id for reply packets using conntrack.

## Future Milestones

### Deferred Milestone 1: Use conntrack to restore trace ID for reply packets
- When a packet includes a trace ID, the conntrack entry should record the ID along with the connection information.
- For subsequent packets – if a trace ID is not present in the IP Options – use the track ID from conntrack to restore the internal state.
- Note this will not modify the IP option header.

This will ensure that subsequent packets in the same connection maintain their traceability even if they do not carry the ip_trace_id explicitly in their IP options. If we push the IP option parsing to Hubble, we lose this ability.

Also, by pushing this to Hubble, we lose integration with the cilium monitor for events. Considering the future customer use-cases of this feature, we want the ability to see the ip_trace_id in the monitor.

Is there a specific concern you have about including this in the trace_notify and drop_notify structs? (these fields would only be populated if the feature is enabled and there is an embedded ip_trace_id).

bowei commented 3 months ago

CC: @sypakine

joestringer commented 3 months ago

I think this CFP is in a solid state for implementing. There's a couple of details that could maybe be hashed out a bit more, but I don't think they ultimately detract that much from the overall proposal, and may be impacted by the experience of implementing it. The proposal has been out plenty long enough for @cilium/sig-hubble and @cilium/sig-datapath to provide feedback, and there's now committers endorsing the proposal, so I think this is in a good enough state to merge.

As and when the implementation matures and impacts the design, we can always iterate further on this CFP, but we don't have to hold it as a PR during that period. Let's get this merged :rocket: