buger / goreplay

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes.
https://goreplay.org
Other
18.62k stars 21 forks source link

Lots of missing HTTP responses. How to debug? #894

Open StanleyP opened 3 years ago

StanleyP commented 3 years ago

Hello, with goreplay 1.2.0 I have A LOTS of missing HTTP responses for some payloads (specific clients, but nothing extraordinary in the HTTP protocol).

Most other payloads are recorded ok, missing response only occasionally.

I tried v1.1.0, 1.3.RC1 and compiled from master but these versions behave even worse, almost no response it captured using these versions on the specific payload. Best "behaving" version for me is still official 1.2.0 from github.

I have sample pcap file that captures specific problematic payload, but it contains security tokens and other sensitive data, so I have problem sharing it.

How can I debug?

Increasing verbosity does not help with diagnostics, all I can see are [EMITTER] lines being printed for captured traffic, but not a single information about dropped packets and reason why it was dropped.

When I discovered goreplay a tested it initially I was amazed and it really is great piece of software (thank you for that).

But for my usecase I need rock solid HTTP traffic capture setup and I dont know why goreplay behaves in such a weird way.

The specific problematic payload is actually single request and single response, no heavy-load/parallel scenario, but goreplay fails to decode the response for no obvious reason.

Any help how to diagnose missing responses in more detail would be really appreciated.

StanleyP commented 3 years ago

Additional info:

$ uname -a
Linux dih-dev-vmss-fe-dmz-000005 5.4.0-1031-azure #32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I tried all the buffer, snaplen and basically all options that seemed to might have affected the capture, but no luck.

StanleyP commented 3 years ago

I have also taken the "problematic" payload (request + response) from the pcap and served it locally.

The setup was:

With this setup the local capture worked perfect, so it seems that issue will be related to some network/packet fragmentation.

Overall I performed some stats queries and it show that 10% of all response payloads are missing.

buger commented 3 years ago

How big is your request? Also i wonder if TCP dump will work better in this case, and goreplay can actually read pcap files: gor --input-raw-engine pcap_file --input-raw ./file.pcap

Also you can try running with --intput-raw-engine raw_socket

Let me know how it works!

StanleyP commented 3 years ago

The specific "problematic" request and response that I managed to capture are both small (request ~3kB, response ~2kB), but response to this type of call can be quite big.

I tried to run gor --input-raw-engine pcap_file --input-raw ./traffic.pcap but the "problematic" response is not decoded from pcap file by gor. This pcap file contains two other requests and these are decoded OK.

Will try the --intput-raw-engine raw_socket, thank you very much for hint!!

StanleyP commented 3 years ago

Just tried the --intput-raw-engine raw_socket and the result is the same, lots of missing responses.

When I try to capture the traffic with tcpdump, the responses are present in the pcap file, but gor fails to decode it for some reason.

StanleyP commented 3 years ago

I tried to inspect the pcap files with wireshark and I see all the requests and responses are there OK.

buger commented 3 years ago

You will be able to share paco file it will be perfect.

Чт, 28 янв. 2021 г. в 19:32, StanleyP notifications@github.com:

I tried to inspect the pcap files with wireshark and I see all the requests and responses are there OK.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/buger/goreplay/issues/894#issuecomment-769208029, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADNOJGQGP3UQGFMJNTEFTS4GGQPANCNFSM4WVKOVEQ .

-- Sincerely yours, Leonid Bugaev https://goreplay.org - test your system with real data @buger https://twitter.com/buger - me on twitter

StanleyP commented 3 years ago

As I am investigating the issue I found that socket I am listening to with goreplay uses HAProxy PROXY v1 protocol.

https://www.haproxy.com/blog/haproxy/proxy-protocol/ http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

Basically the first line on the new connection starts with

PROXY TCP4 <src_ip> <dst_ip> <src_port> <dst_port>\r\n
GET / HTTP/1.1\r\n
...

Could this be a problem when matching request/response or generally with treating the payload as HTTP?

Because first line of the first request on new connection does not conform to HTTP protocol as it has the PROXY... line. Maybe this is the real reason why my responses get discarded.

What would it take for goreplay to ignore this first line if it starts with PROXY keyword?

urbanishimwe commented 3 years ago

@StanleyP on the latest version, to capture other protocols, you can use -input-raw-protocol=binary, but it will merge messages sent with HTTP-KeepAlive together and if first TCP packet(with SYN flag) is missed, the entire one-way session will be ignored. (this approach tends to reduce memory and CPU usage)

StanleyP commented 3 years ago

Thanks for suggestion, but I need both requests and responses as separate messages.

urbanishimwe commented 3 years ago

yes, it will separate them! with Connection: Close header.

StanleyP commented 3 years ago

Unfortunately most clients communicate http/1.1 which has default keepalive a keepalive is actively utilized.

StanleyP commented 3 years ago

I was finally authorized by management to send you a sample traffic capture in pcap format for analysis. I sent email to support@goreplay.org with the issue URL in subject. Thanks for any help or hints.

StanleyP commented 3 years ago

Have you managed to reproduce the issue on your side from the pcap file?

urbanishimwe commented 3 years ago

@StanleyP with gorv1.3rc1 command ./gor --input-raw sample.pcap:0 -input-raw-engine pcap_file -output-stdout -verbose 10 I saw one request and one response from the sample file!

StanleyP commented 3 years ago

Of course I would like to use latest and greatest version of GoReplay. I tried to use gorv1.3rc1 before, but this version does not work for me with the middleware. As soon as request is capture, gorv1.3rc1 crashes like this:

2021/02/08 11:24:07 [PPID 749 and PID 930] Version:1.2.0
panic: runtime error: index out of range [296] with length 296

goroutine 13 [running]:
encoding/hex.Decode(0xc0002f6000, 0x128, 0x128, 0xc0002f4000, 0x252, 0x253, 0x0, 0x0, 0x0)
        /usr/local/go/src/encoding/hex/hex.go:69 +0x22b
main.(*Middleware).read(0xc00007cdc0, 0x10fe040, 0xc000010178)
        /go/src/github.com/buger/goreplay/middleware.go:107 +0x159
created by main.NewMiddleware
        /go/src/github.com/buger/goreplay/middleware.go:42 +0x2e3

my middleware is in nodejs and I tried to strip it down as much as possible, but crash still occurs. The crash seems to be in gor itself. My current stripped down MW is:

var gor = require("goreplay_middleware");

gor.init();

gor.on('message', function(m) {
  return m;
})

The command that I use to invoke gor is:

exec $gor_bin \
  -input-raw 127.0.0.1:8008 \
  -input-raw-protocol http \
  -input-raw-track-response \
  -middleware $wd/mwfilter.sh \
  -output-stdout

Should I open separate issue for this?

urbanishimwe commented 3 years ago

@StanleyP I think it should be a separate issue!

StanleyP commented 3 years ago

https://github.com/buger/goreplay/issues/900

StanleyP commented 3 years ago

I just want to post quick update of my issue: today I deployed "hybrid" workaround that finally seems to work and so far all responses are showing ok!

What I did was that I took goreplay master branch and modified proto/proto.go to ignore/skip PROXY protocol row. I am no go expert and I am not proud of my patch, but at least it seems to work;) I might share the patch if you want to (doubt this interest you). This seems to work in regard to response capture, but still crashes on middleware.

So I removed the middleware from this instance, added -output-tcp and forward the traffic to another goreplay instance (official v1.2.0). The second instance has working middleware so I do the middleware processing in the second goreplay instance.

Far from ideal, but it works.

Looking forward to fix for #900 so I can use only one instance :)

Kind regards, Stanislav Pavlíček

buger commented 3 years ago

Proxy protocol! It makes sense now. Will be happy to accept your patch.

StanleyP commented 3 years ago

Guys, few minutes ago I deployed my patched version that uses single gor instance with middleware and everything works OK! Looking forward to the next gor RC or stable version! I will tidy I my my "HAProxy PROXY protocol" patch and send a pull request. Thank you very much for support!!! We can close this issue now.

StanleyP commented 3 years ago

I created pull request with PROXY V1 protocol patch https://github.com/buger/goreplay/pull/902