gchux / cloud-run-tcpdump

Cloud Run packet capturing sidecar
Apache License 2.0
7 stars 2 forks source link

Current PCAPs aren't flushed in time when instance receives SIGTERM #16

Closed apgiorgi closed 1 month ago

apgiorgi commented 2 months ago

Also, I don't see the log output for "signaled:", as I would expect from https://github.com/gchux/cloud-run-tcpdump/blob/main/tcpdumpw/main.go#L404

Would there be a way to guarantee that tcpdump terminates cleanly and flushes the pcap to GCS before the final instance shutdown?

gchux commented 2 months ago

Hello!

thanks for reaching out, flushing behavior is currently as follows:

Questions:

apgiorgi commented 1 month ago

Thanks @gchux !

The use case is to capture the very last request that results on a SIGTERM. So the timing is tight.

Currently, the PCAP_ROTATE_SECS used the default 60 seconds. Next test is to set it with 5 seconds (or less), and try to have those files flushed before instance termination, maybe adding some delay on the SIGTERM handler of the application. Nevertheless, It would be nice if tcpdumpw (or pcap-cli?) could handle the SIGTERM and guarantee that the last capture get flushed correctly, but I'm not sure if it's possible at all (can you send a signal to tcpdump?)

The envs defined:

    env:
    - name: PCAP_IFACE
      value: eth
    - name: PCAP_GCS_BUCKET
      value: pnp-eu-tcpdump
    - name: PCAP_FILTER
      value: tcp

Here are the arguments:

iface:
use_cron:false
cron_exp:-
timezone:UTC
timeout:0
extension:pcap
directory:/pcap-tmp
snaplen:0
filter:TCP
interval:60
tcpdump:true
jsondump:false
jsonlog:false
ordered:false
hc_port:12345
gcp_gae:false
gchux commented 1 month ago

hello @apgiorgi

I've reviewed this behavior in depth, and after reviewing this behavior with @thomasmburke we have defined a path for exporting remaining PCAP files on SIGTERM:

So the issue was that the sidecar's entrypoint was not properly propagating the signal to the supervisor process. In addition to that, we have enabled flushing files concurrently on receiving the SIGTERM from Cloud Run.

see commit: https://github.com/gchux/cloud-run-tcpdump/commit/e4c4ad10a10395c3371cef2e8aec22f988c728c5

feel free to test it out using: ghcr.io/gchux/cloud-run-tcpdump:v1.0.60-RC4

gchux commented 1 month ago

here's a sample of how it looks like in action:

image

gchux commented 1 month ago

ghcr.io/gchux/cloud-run-tcpdump:v1.0.60-RC5 is now available with improved logging for PCAP files flushing.