tcpdump
sidecarThis repository contains the source code to create a container image containing tcpdump
and pcap-cli
to perform packet capture in Cloud Run multi-container deployments.
Captured packets are optionally translated to JSON and written into Cloud Logging
During development, it is often useful to perform packet capturing to troubleshoot specific/gnarly network related conditions/issues.
This container image is to be used as a sidecar of the Cloud Run main –ingress– container in order to perform a packet capture using tcpdump
within the same network namespace.
The sidecar approach enables decoupling from the main –ingress– container so that it does not require any modifications to perform a packet capture; additionally, sidecars use their own resources which allows tcpdump
to not compete with the main app resources allocation.
NOTE: the main –ingress– container is the one to which all ingress traffic ( HTTP Requests ) is delivered to; for Cloud Run services, this is typically your APP container.
HTTP/1.1
or HTTP/2
analysis:HTTP/1.1
with raw message.HTTP/1.1
message and HTTP/2
frames analysis..json
and .pcap
file formats with optional gzip compressionSIGTERM
to ensure all completed pcap files are flushed to GCS before container exits.tcpdump
filter, interface, snapshot length, pcap file rotation duration.tcpdump
filter creation by defining: FQDN, ports and TCP flags.tcpdump
executions via CRON
.tcpdump
installed from Ubuntu's official repository to perform packet captures.gopacket
to perform packet capturing and getting a handle on all captured packets.tcpdump
.The sidecar uses:
tcpdump
/pcap-cli
to capture packets in both wireshark compatible format and JSON
. All containers use the same network namespace and so this sidecar captures packets from all containers within the same instance.
pcap-cli
allows to perform packet translations into Cloud Logging compatible structured JSON
. It also provides HTTP/1.1
and HTTP/2
analysis, including Trace context awareness (X-Cloud-Trace-Context
/traceparenmt
) to hydrate structured logging with trace information which allows rich network data analysis using Cloud Trace.
tcpdumpw
to execute tcpdump
/pcap-cli
and generate PCAP files; optionally, schedules tcpdump
/pcap-cli
executions.
pcap-fsnotify
to listen for newly created PCAP files, optionally compress PCAPs ( recommended ) and move them into Cloud Storage mount point.
GCSFuse to mount a Cloud Storage Bucket to move compressed PCAP files into.
PCAP files are moved from the sidecar's in-memory filesystem into the mounted Cloud Storage Bucket.
The pcap sidecar has images that are compatible with both Cloud Run execution environments.
[!IMPORTANT]
- The gen1 images are compatible for BOTH gen1 and gen2 Cloud Run execution environments.
- The gen2 images are compatible for ONLY the gen2 Cloud Run execution environment.
This is because gen1 does not support the newest version of libpcap, whereas gen2 does.
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:latest
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:v#.#.#-gen1
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:newest
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:v#.#.#-gen2
Define environment variables to be used during Cloud Run service deployment:
export SERVICE_NAME='...' # Cloud Run or App Engine Flex service name
export SERVICE_REGION='...' # GCP Region: https://cloud.google.com/about/locations
export SERVICE_ACCOUNT='...' # Cloud Run service's identity
export INGRESS_CONTAINER_NAME='...' # the name of the ingress container i/e: `app`
export INGRESS_IMAGE_URI='...'
export INGRESS_PORT='...'
export TCPDUMP_SIDECAR_NAME='...' # the name of the pcap sidecar i/e: `pcap-sidecar`
# public image compatible with both gen1 & gen2. Alternatively build your own
export TCPDUMP_IMAGE_URI='us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:latest'
export PCAP_IFACE='eth' # prefix of the interface in which packets should be captured from
export PCAP_GCS_BUCKET='...' # the name of the Cloud Storage Bucket to mount
export PCAP_FILTER='...' # the BPF filter to use; i/e: `tcp port 443`
export PCAP_JSON_LOG=true # set to `true` for writting structured logs into Cloud Logging
Deploy the Cloud Run service including the tcpdump
sidecar:
[!NOTE]
If adding thetcpdump
sidecar to a preexisting Cloud Run service that is a single container service the gcloud command will fail.You will need to instead make these updates via the Cloud Console or create a new Cloud Run service.
gcloud run deploy ${SERVICE_NAME} \
--project=${PROJECT_ID} \
--region=${SERVICE_REGION} \
--service-account=${SERVICE_ACCOUNT} \
--container=${INGRESS_CONTAINER_NAME} \
--image=${INGRESS_IMAGE_URI} \
--port=${INGRESS_PORT} \
--container=${TCPDUMP_SIDECAR_NAME} \
--image=${TCPDUMP_IMAGE_URI} \
--cpu=1 --memory=1G \
--set-env-vars="PCAP_IFACE=${PCAP_IFACE},PCAP_GCS_BUCKET=${PCAP_GCS_BUCKET},PCAP_FILTER=${PCAP_FILTER},PCAP_JSON_LOG=${PCAP_JSON_LOG} \
See the full list of available flags for
gcloud run deploy
at https://cloud.google.com/sdk/gcloud/reference/run/deploy
tcpdump
sidecar, but this configuration is not available via gcloud due to needing to configure healthchecks for the sidecar container. To make all containers depend on the tcpdump
sidecar, edit the Cloud Run service via the Cloud Console and make all other containers depend on the tcpdump
sidecar and add the following TCP startup probe healthcheck to the tcpdump
sidecar:startupProbe:
timeoutSeconds: 1
periodSeconds: 10
failureThreshold: 10
tcpSocket:
port: 12345
You can optionally choose a different port by setting
PCAP_HC_PORT
as an env var of thetcpdump
sidecar
The tcpdump
sidecar accepts the following environment variables:
PCAP_IFACE
: (STRING, required) a prefix for the interface to perform packet capturing on; i/e: eth
, ens
...
Notice that
PCAP_IFACE
is not the full interface name nor a regex or a pattern, but a prefix; soeth0
becomeseth
, andens4
becomesens
PCAP_GCS_BUCKET
: (STRING, required) the name of the Cloud Storage Bucket to be mounted and used to store PCAP files. Ensure that you provide the runtime service account the roles/storage.admin
so that it may create objects and read bucket metadata.
PCAP_L3_PROTOS
: (STRING, optional) comma separated list of network layer protocols; default value is ipv4,ipv6
.
PCAP_L4_PROTOS
: (STRING, optional) comma separated list of transport layer protocols; default value is tcp,udp
.
PCAP_IPV4
: (STRING, optional) comma separated list of IPv4 addresses or IPv4 networks using CIDR notation; default value is DISABLED
. Example: 127.0.0.1,127.0.0.1/32
.
PCAP_IPV6
: (STRING, optional) comma separated list of IPv6 addresses or IPv6 networks using CIDR notation; default value is DISABLED
. Example: ::1,::1/128
.
PCAP_HOSTS
: (STRING, optional) comma separated list of FQDNs (hosts) to capture traffic to/from; default value is ALL
. Example: metadata.google.internal,pubsub.googleapis.com
.
PCAP_PORTS
: (STRING, optional) comma separated list of translport layer addresses (UDP or TCP ports) to capture traffic to/from; default value is ALL
. Example: 80,443
.
PCAP_TCP_FLAGS
: (STRING, optional) comma separated list of lowercase TCP flags that a segment must contain for it to be captured; default value is ANY
. Example: syn,rst
.
PCAP_SNAPSHOT_LENGTH
: (NUMBER, optional) bytes of data from each packet rather than the default of 262144 bytes; default value is 0
. See https://www.tcpdump.org/manpages/tcpdump.1.html#:~:text=%2D%2D-,snapshot%2Dlength,-%3Dsnaplen
PCAP_ROTATE_SECS
: (NUMBER, optional) how often to rotate PCAP files created by tcpdump
; default value is 60
seconds.
GCS_MOUNT
: (STRING, optional) where in the sidecar in-memory filesystem to mount the Cloud Storage Bucket; default value is /pcap
.
PCAP_FILE_EXT
: (STRING, optional) extension to be used for PCAP files; default value is pcap
.
PCAP_COMPRESS
: (BOOLEAN, optional) whether to compress PCAP files or not; default value is true
.
PCAP_TCPDUMP
: (BOOLEAN, required) whether to use tcpdump
or not ( tcpdump
will generate pcap files, if not PCAP_JSON
must be enabled ) and push those .pcap
files to GCS; default valie is true
.
PCAP_JSON
: (BOOLEAN, optional) whether to use JSON
to dump packets or not into GCS ; default value is false
.
PCAP_TCPDUMP
andPCAP_JSON
maybe be bothtrue
in order to generate both:.pcap
and.json
PCAP files that are stored in GCS.
PCAP_JSON_LOG
: (BOOLEAN, optional) wheter to write JSON
translated packets into stdout
( PCAP_JSON
may not be enabled ); default value is false
.
This is useful when
Wireshark
is not available, as it makes it possible to have all captured packets available in Cloud Logging
PCAP_ORDERED
: (BOOLEAN, optional) when PCAP_JSON
or PCAP_JSON_LOG
are enabled, wheter to print packets in captured order ( if set to false
, packet will be written as fast as possible ); default value is false
.
In order to improve performance, packets are translated and written concurrently; when
PCAP_ORDERED
is enabled, only translations are performed concurrently. EnablingPCAP_ORDERED
may cause packet capturing to be slower, so it is recommended to keep it disabled as all translated packets have apcap.num
property to assert order.
PCAP_HC_PORT
: (NUMBER, optional) the TCP port that should be used to accept startup probes; connections will only be accepted when packet capturing is ready; default value is 12345
.
More advanced use cases may benefit from scheduling tcpdump
executions. Use the following environment variables to configure scheduling:
PCAP_FILTER
: (STRING, optional) standard tcpdump
bpf filters to scope the packet capture to specific traffic; i/e: tcp
. Its default value is DISABLED
.
PCAP_USE_CRON
: (BOOLEAN, optional) whether to enable scheduling of tcpdump
executions; default value is false
.
PCAP_CRON_EXP
: (STRING, optional) cron
expression used to configure scheduling tcpdump
executions.
PCAP_USE_CRON
is set to true
, then PCAP_CRON_EXP
is required. See https://crontab.cronhub.io/ to get help with crontab
expressions.PCAP_TIMEZONE
: (STRING, optional) the Timezone ID used to configure scheduling of tcpdump
executions using PCAP_CRON_EXP
; default value is UTC
.
PCAP_TIMEOUT_SECS
: (NUMBER, optional) seconds tcpdump
execution will last; devault value is 0
: execution will not be stopped.
NOTE: if
PCAP_USE_CRON
is set totrue
, you should set this value to less than the time in seconds between scheduled executions.
The Cloud Storage Bucket mounted by the tcpdump
sidecar is not accessible by the main –ingress– container.
Processes running in the tcpdump
sidecar are not visible to the main –ingress– container ( or any other container ); similarly, the tcpdump
sidecar doesn't have visibility of processes running in other containers.
All PCAP files will be stored within the Cloud Storage Bucket with the following "hierarchy": PROJECT_ID
/SERVICE_NAME
/GCP_REGION
/REVISION_NAME
/INSTANCE_STARTUP_TIMESTAMP
/INSTANCE_ID
.
this hierarchy guarantees that PCAP files are easily indexable and hard to override by multiple deployments/instances.
It also simplifies deleting no longer needed PCAPs from specific deployments/instances.
When defining PCAP_ROTATE_SECS
, keep in mind that the current PCAP file is temporarily stored in the sidecar in-memory filesystem. This means that if your APP is network intensive:
When defining PCAP_SNAPSHOT_LENGTH
, keep in mind that a large value will result in larget PCAP files; additionally, you may not need to ispect the data, just the packet headers.
Keep in mind that every Cloud Run instance will produce its own set of PCAP files, so for troubleshooting purposes, it is best to define a low Cloud Run maximum number of instances.
It is equally important to define a well scoped BPF filter in order to capture only the required packets and skip everything else. The
tcpdump
flag --snapshot-length is also useful to limit the bytes of data to capture from each packet.
Packet capturing is always on while the instance is available, so it is best to rollback to a non packet capturing revision and delete the packet-capturing one after you have captured all the required traffic.
The full packet capture from a Cloud Run instance will be composed out of multiple smaller ( optionally compressed ) PCAP files. Use a tool like mergecap to combine them into one.
In order to be able to mount the Cloud Storage Bucket and store PCAP files, Cloud Run's identity must have proper roles/permissions.
The tcpdump
sidecar is intended to be used for troubleshooting purposes only. While the tcpdump
sidecar has its own set of resources, storing bytes from PCAP files in Cloud Storage and logging packet translations into Cloud Logging introduces additional costs for both Storage and Networking.
Define a BPF filter to capture just the required packets, and nothing else; examples of bad filters for long running or data intensive tests: tcp
, tcp or udp
, tcp port 443
, etc...
Set PCAP_COMPRESS
to true
to store compressed PCAP files and save storage bytes; additionally, use regional Buckets to minize costs.
Whenever possible, use packet capturing scheduling to avoid running tcpdump
100% of instance lifetime.
When troubleshooting is complete, deploy a new Revision without the tcpdump
sidecar to completely disable it.
While it is true that Cloud Storage volume mounts is an available built in feature of Cloud Run, GCSFuse is used instead to minimize the required configuration to deploy a Revision instrumented with the tcpdump
sidecar.
NOTE: this is also the reason why the base image for the
tcpdump
sidecar isubuntu:22.04
and not something lighter likealpine
. GCSFuse pre-built packages are only available for Debian and RPM based distributions.
While setting PCAP_ORDER
to true
is a good alternative for low traffic scenarios, it is recommended setting it to false
for most other cases since the level of concurrency is reduced (only for translations) in order to guarantee packet order.
NOTE: packet order means the order in which the underlying engine (
gopacket
) delivers captured packets.
Use scheduled packet capturing ( PCAP_USE_CRON
and other advanced flags ) if you don't need to capture packets 100% of instance runtime as it will reduce the number of PCAP files
.
NOTE: this sidecar is subject to Cloud Run CPU allocation configuration; so if the revision is configured to only allocate CPU during request processing, then CPU will also be throttled for the sidecar. This means that when CPU is only allocated during request processing, no packet capturing will happen outside request processing; the same applies for
PCAP files
export into Cloud Storage.
Use Cloud Logging to look for the entry starting with: [INFO] - PCAP files available at: gs://
...
It may be useful to use the following filter:
resource.type = "cloud_run_revision"
resource.labels.service_name = "<cloud-run-service-name>"
resource.labels.location = "<cloud-run-service-region>"
"<cloud-run-revision-name>"
"PCAP files available at:"
This entry contains the exact Cloud Storate path to be used to download all the PCAP files.
Copy the full path including the prefix gs://
, and assign it to the environment variable GCS_PCAP_PATH
.
Download all PCAP files using:
mkdir pcap_files
cd pcap_files
gcloud storage cp ${GCS_PCAP_PATH}/*.gz . # use `${GCS_PCAP_PATH}/*.pcap` if `PCAP_COMPRESS` was set to `false`
If PCAP_COMPRESS
was set to true
, uncompress all the PCAP files: gunzip ./*.gz
Merge all PCAP files into a single file:
for .pcap
files: mergecap -w full.pcap -F pcap ./*_part_*.pcap
for .json
files: cat *_part_*.json | jq -crMs 'sort_by(.pcap.date)' > pcap.json
See
mergecap
docs: https://www.wireshark.org/docs/man-pages/mergecap.htmlSee
jq
docs: https://jqlang.github.io/jq/manual/ , JSON pcaps are particularly useful when Wireshark is not available.
Define the PROJECT_ID
environment variable; i/e: export PROJECT_ID='...'
.
Clone this repository:
git clone --depth=1 --branch=main --single-branch https://github.com/gchux/cloud-run-tcpdump.git
[!TIP] If you prefer to let Cloud Build perform all the tasks, go directly to build using Cloud Build
cd cloud-run-tcpdump
.Continue with one of the following alternatives:
Build and push the tcpdump
sidecar container image:
export TCPDUMP_IMAGE_URI='...' # this is usually Artifact Registry e.g. '${_REPO_LOCATION}-docker.pkg.dev/${PROJECT_ID}/${_REPO_NAME}/${_IMAGE_NAME}'
export RUNTIME_ENVIRONMENT='...' # either 'cloud_run_gen1' or 'cloud_run_gen2'
./docker_build ${RUNTIME_ENVIRONMENT} ${TCPDUMP_IMAGE_URI}
This approach assumes that Artifact Registry is available in PROJECT_ID
.
Define the following environment variables:
export REPO_LOCATION='...' # Artifact Registry Docker repository location e.g. us-central1
export REPO_NAME='...' # Artifact Registry Docker repository name
export IMAGE_NAME='...' # container image name; i/e: `pcap-sidecar`
Build and push the tcpdump
sidecar container image using Cloud Build:
gcloud builds submit \
--project=${PROJECT_ID} \
--config=$(pwd)/cloudbuild.yaml \
--substitutions='_REPO_LOCATION=${REPO_LOCATION},_REPO_NAME=${REPO_NAME},_IMAGE_NAME=${IMAGE_NAME}' $(pwd)
See the full list of available flags for
gcloud builds submit
: https://cloud.google.com/sdk/gcloud/reference/builds/submit
Enable debug mode an App Engine Flexible instance: https://cloud.google.com/appengine/docs/flexible/debugging-an-instance#enabling_and_disabling_debug_mode
Connect to the instnace using SSH: https://cloud.google.com/appengine/docs/flexible/debugging-an-instance#connecting_to_the_instance
Escalate privileges; execute: sudo su
Create the following env
file named pcap.env
, use the following sample to define sidecar variables:
# $ touch pcap.env
PCAP_GAE=true
PCAP_GCS_BUCKET=the-gcs-bucket # the name of the Cloud Storage bucket used to store PCAP files
GCS_MOUNT=/gae/pcap # where to mount the Cloud Storage bucket within the container FS
PCAP_IFACE=eth # network interface prefix
PCAP_FILTER=tcp or udp # BPF filter to scope packet capturing to specific network traffic
PCAP_SNAPSHOT_LENGTH=0
PCAP_USE_CRON=false # do not schedule packet capturing
PCAP_TIMEZONE=America/Los_Angeles
PCAP_TIMEOUT_SECS=60
PCAP_ROTATE_SECS=30
PCAP_TCPDUMP=true
PCAP_JSON=true
PCAP_JSON_LOG=false # NOT necessary, packet translations are streamed directly to Cloud Logging
PCAP_ORDERED=false
Create a directory to store the PCAP files in the host filesystem: mkdir gae
Pull the sidecar container image: docker --config=/etc/docker pull ${TCPDUMP_IMAGE_URI}
Run the sidecar to start capturing packets:
docker run --rm --name=pcap -it \
--cpus=1 --cpuset-cpus=1 \
--privileged --network=host \
--env-file=./pcap.env \
-v ./gae:/gae -v /var/log:/var/log \
-v /var/run/docker.sock:/docker.sock \
${TCPDUMP_IMAGE_URI} nsenter -t 1 -u -n -i /init \
>/var/log/app_engine/app/STDOUT_pcap.log \
2>/var/log/app_engine/app/STDERR_pcap.log
NOTE: for GAE Flex: it is strongly recommended to not use
PCAP_FILTER=tcp or udp
( or eventcp port 443
) as packets are streamed into Cloud Logging using its gRPC API,which means that traffic is HTTP/2 over TCP and so if you capture all TCP and UDP traffic you'll also be capturing all what's being exported into Cloud Logging which will cause a
write aplification effect that will starve memory as all your traffic will eventually be stored in sidecar's memory.
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.