cloudfoundry / bosh-cli

BOSH CLI v2+
Apache License 2.0
178 stars 160 forks source link

feat: Implement pcap cmd #627

Closed b1tamara closed 10 months ago

b1tamara commented 11 months ago

This PR is an attempt to implement the pcap-lite option for bosh pcap cmd according to the RFC https://github.com/cloudfoundry/community/blob/main/toc/rfc/rfc-0019-pcap-bosh.md.

Akin to the BOSH SSH feature, a BOSH pcap feature is proposed with a similar but slightly different workflow. The well-known tcpdump tool is launched on each VM via SSH and transmits its data via the SSH channel. In order to handle multiple VMs and merge their captures, multiple SSH sessions to the respective targets are opened in parallel, merged into a single stream in the bosh cli and available for writing to disk.

An example for the call: bosh -d haproxy pcap -s=96 -f="host X.X.X.X" -o=test.pcap

Co-authored-by: Maximilian Moehl

b1tamara commented 11 months ago

@rkoster

jpalermo commented 11 months ago

Hey @b1tamara is this in a state where it could be moved from draft? If so it will get picked up and we can assign some reviewers to take a look at it.

peanball commented 11 months ago

Hi @jpalermo, our initial request was for a first and more high level look at this "conceptually" and whether this is abstracted enough already for the BOSH Team or not.

@b1tamara is out for the next couple of days and will be back on Wednesday to continue.

There are still some linter issues that should be fixed before we can consider really reviewing the code.

b1tamara commented 11 months ago

Regarding linter check errors: The errors seem to be related to "Unchanged files with check annotations", in other words in the files not related to this PR.

beyhan commented 11 months ago

@b1tamara this is it now ready from your side or do you still work on it?

b1tamara commented 11 months ago

@beyhan It is ready for review.

domdom82 commented 11 months ago

regarding the linter issues: I've run golangci-lint version 1.54.2 locally and fixed all issues, however I don't know why the PR validation keeps failing. It is also failing on completely different errors. Seems like there is something wrong with the setup of the validation, can't fix it from here.

cunnie commented 11 months ago

I'm not sure this addition to the CLI is necessary.

I think what you're looking to do—capture network traffic for subsequent analysis—can be accomplished with the existing BOSH ssh feature-set. For example, to capture 15 seconds worth of traffic on all the "jammy" BOSH instance groups:

bosh -d $DEPLOYMENT -n ssh jammy -c "sudo tcpdump -w /var/vcap/data/out.pcap & sleep 15 ; kill %1"

The .pcap files generated by tcpdump can be retrieved using bosh scp:

for INSTANCE in $(bosh -d $DEPLOYMENT is --json | jq -r '.Tables[].Rows[] | select(.instance | startswith("jammy")) | .instance'); do
    bosh -d $DEPLOYMENT scp $INSTANCE:/var/vcap/data/out.pcap out-${INSTANCE/\//_}.pcap
done

These files can then be merged and analyzed with WireShark.

We realize the bosh scp command is somewhat cumbersome; however, we believe that an enhancement to bosh scp might be better than adding a bespoke pcap command to the CLI, since it would facilitate other cases where a user wishes to extract specific files from multiple machines in a deployment.

maxmoehl commented 11 months ago

Thanks for raising your concerns @cunnie!

This PR is the result of RFC0019, you can view the full history of options considered in the pull request. I'll try to summarise the main arguments:

I think what you're looking to do—capture network traffic for subsequent analysis—can be accomplished with the existing BOSH ssh feature-set. For example, to capture 15 seconds worth of traffic on all the "jammy" BOSH instance groups:

bosh -d $DEPLOYMENT -n ssh jammy -c "sudo tcpdump -w /var/vcap/data/out.pcap & sleep 15 ; kill %1"

The .pcap files generated by tcpdump can be retrieved using bosh scp:

for INSTANCE in $(bosh -d $DEPLOYMENT is --json | jq -r '.Tables[].Rows[] | select(.instance | startswith("jammy")) | .instance'); do
    bosh -d $DEPLOYMENT scp $INSTANCE:/var/vcap/data/out.pcap out-${INSTANCE/\//_}.pcap
done

These files can then be merged and analyzed with WireShark.

We are well aware of this process. The issue is that we have to do it way too often to do it by hand. Building helper scripts proved to be sub-optimal as they only exposed a subset of the functionality of tcpdump / bosh. To mitigate this issue it seemed reasonable to start working on a more resilient, "official" solution (again: see the RFC PR for the full history).

Arguing like this you could question the existence of other commands as well (e.g.: bosh logs). Of course there is an alternative route you can go, but we want to establish a resilient, maintained tool that is often needed in our troubleshooting process.

We realize the bosh scp command is somewhat cumbersome; however, we believe that an enhancement to bosh scp might be better than adding a bespoke pcap command to the CLI, since it would facilitate other cases where a user wishes to extract specific files from multiple machines in a deployment.

This would be a welcome addition in any case!

cunnie commented 11 months ago

Hi @maxmoehl , thanks for your detailed response.

We are well aware of this process. The issue is that we have to do it way too often to do it by hand.

Ah—I do packet capture rarely, and usually ssh'ing onto a single VM is adequate. Maybe if I had to do packet capture across multiple VMs frequently I'd feel differently.

rkoster commented 11 months ago

The need and scope of the bosh pcap command have been discussed in great detail as part of RFC0019, given the RFC was approved and consensus was reached, this PR review should focus on the technical implementation, not the validity of the solution.

b1tamara commented 10 months ago

@ystros Thank you very much for your review. Regarding your point to run the pcap feature against a BOSH env behind a jumpbox using gateway flags. Yes, you are right. The current implementation does not support the usage of gateway flags. The scenario which we want to support on the first place is to start a capture from the jumpbox directly. We are trying to connect to host(s) via ssh connection using the go crypto ssh library.

Supporting gateway flags in order to start the capture using BOSH_ALL_PROXY would make the implementation a bit complicated. The connection to Proxy Server via socks5 protocol should be established and our current ssh connection implementation should be wrapped in. Unfortunately we do not have experience with SOCKS5 protocol / SOCKS5 Proxy and don't have the test environment step up to test such scenarios. Our suggestion would be to start with the feature without supporting gateway flags and would see that as an add on feature.

ystros commented 10 months ago

@b1tamara Thanks for the changes!

If the use case is mainly geared towards running on jumpboxes, then this should be fine for now. Support for the gateway flags + BOSH_ALL_PROXY could probably come in a later PR. We should just be sure to clearly document it when pcap is added to CLI command docs: https://bosh.io/docs/cli-v2/

jpalermo commented 10 months ago

Hey @b1tamara, I think we're ready to merge this in as soon as those linting errors are resolved. Thanks again for this work!

b1tamara commented 10 months ago

Hello @jpalermo @ystros. Sorry for the response delay, I resolved linter errors and forwarded DisableSOCKS to boshssh.ClientOpts.

rkoster commented 10 months ago

Thanks @b1tamara! However there is still one issue left in the linter:

  Error: could not import github.com/gopacket/gopacket/pcap (-: # github.com/gopacket/gopacket/pcap
  Error: vendor/github.com/gopacket/gopacket/pcap/pcap_unix.go:35:10: fatal error: pcap.h: No such file or directory
     35 | #include <pcap.h>
        |          ^~~~~~~~
  compilation terminated.) (typecheck)

source: here

b1tamara commented 10 months ago

@rkoster I added a new step in the linter workflow to install libpcap-dev for ubuntu runners. It is exciting that now all linter checks are suddenly green.

b1tamara commented 10 months ago

The build pipeline might need to be adjusted to install libpcap otherwise it can lead to compilation failure.

beyhan commented 10 months ago

Thank you for this contribution!

jpalermo commented 9 months ago

This was released in 7.5.0