fmadio / pcap2json

High Speed PCAP to JSON conversion utility
Other
99 stars 22 forks source link

ES HTTP 1.1 persistent connections #22

Closed fmadio closed 4 years ago

fmadio commented 5 years ago

ES uses netty which supports HTTP 1.1

https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html

in theory we should be able to use persistent connections for the ES push side of things. Requires some investigation on whats possible

nanji-fmad commented 5 years ago

Hi, I did some changes for persistent connection...and here is my analysis for time taken for ES push with one server.

Command : lz4 -d -c /interop17_hotstage_20170609_133953.717.953.280.pcap.lz4 | time ./pcap2json --config ./pcap2json.config

With Persistent connection changes:

Trial 1 (1 ES server):

PCAPWall time: 308.99 sec ProcessTime 488.03 sec (1.579) Total Time: 488.03 sec RawInput[Wire 1.616 Gbps Capture 1.616 Gbps 0.178 Mpps] Output[0.033 Gbps] TotalLine:2168820 4444 Line/Sec real 8m 8.08s user 1m 9.98s sys 1m 10.45s

Doc: 1760720 | 777.9mb

Trial 2 (1 ES server):

PCAPWall time: 308.99 sec ProcessTime 457.03 sec (1.479) Total Time: 457.03 sec RawInput[Wire 1.726 Gbps Capture 1.726 Gbps 0.190 Mpps] Output[0.035 Gbps] TotalLine:2168820 4745 Line/Sec real 7m 37.08s user 1m 3.13s sys 0m 59.97s

Doc: 1778047 | 786.3mb

Trial-3 (3 ES server - containers)

PCAPWall time: 308.99 sec ProcessTime 491.38 sec (1.590) Total Time: 491.38 sec RawInput[Wire 1.605 Gbps Capture 1.605 Gbps 0.177 Mpps] Output[0.033 Gbps] TotalLine:2168820 4414 Line/Sec real 8m 11.48s user 1m 4.55s sys 1m 1.31s

Without Persistent connection changes:

Trial 1 (1 ES server):

PCAPWall time: 308.99 sec ProcessTime 857.23 sec (2.774) Total Time: 857.23 sec RawInput[Wire 0.920 Gbps Capture 0.920 Gbps 0.101 Mpps] Output[0.019 Gbps] TotalLine:2168820 2530 Line/Sec real 14m 17.28s user 3m 5.47s sys 4m 22.40s

Doc: 1719511 | 805.2mb

Trial 2 (1 ES server):

PCAPWall time: 308.99 sec ProcessTime 855.94 sec (2.770) Total Time: 855.94 sec RawInput[Wire 0.922 Gbps Capture 0.922 Gbps 0.101 Mpps] Output[0.019 Gbps] TotalLine:2168820 2534 Line/Sec real 14m 15.99s user 3m 9.28s sys 4m 19.49s

Doc: 1718485 | 801.9mb

Trial-3 (3 ES server - containers)

PCAPWall time: 308.99 sec ProcessTime 860.84 sec (2.786) Total Time: 860.84 sec RawInput[Wire 0.916 Gbps Capture 0.916 Gbps 0.101 Mpps] Output[0.019 Gbps] TotalLine:2168820 2519 Line/Sec real 14m 20.89s user 3m 10.81s sys 4m 18.79s

Persistent connections can be viewed on the ES server by conn. property. netstat -anep | grep 192.168.1.195 <-- IP address of pcap2json server

fmadio commented 5 years ago

1 ES Sever no persistent connection 14min 1 ES Sever with persistent connection 8min

About a 43% speed up, excellent

fmadio commented 5 years ago

Need to try filter out the response for the bulk uploads https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#common-options-response-filtering

fmadio commented 5 years ago

indeed using filter_path does remove all the index data on return https://stackoverflow.com/questions/26171971/elasticsearch-bulk-operation-omit-response

fmadio commented 5 years ago

Not sure if its ES related or not, but keepalive seems to be significantly slower. Even with filter_path return optimization

With --output-keepalive ~ 90sec

Total Time: 89.47 sec RawInput[Wire 4.227 Gbps Capture 0.658 Gbps 0.465 Mpps] Output[0.056 Gbps] TotalLine:700403 7829 Line/Sec 
Total Time: 90.12 sec RawInput[Wire 4.197 Gbps Capture 0.653 Gbps 0.462 Mpps] Output[0.056 Gbps] TotalLine:700403 7772 Line/Sec   

Without --output-keepalive ~ 52sec


Total Time: 50.31 sec RawInput[Wire 7.518 Gbps Capture 1.170 Gbps 0.827 Mpps] Output[0.100 Gbps] TotalLine:700403 13922 Line/Sec      
Total Time: 53.66 sec RawInput[Wire 7.049 Gbps Capture 1.097 Gbps 0.775 Mpps] Output[0.094 Gbps] TotalLine:700403 13054 Line/Sec 
fmadio commented 5 years ago

without filter_path and just dropping the connection after 16K Rx data ~ 31sec vs 52sec (with filter_path)

Total Time: 31.28 sec RawInput[Wire 25.221 Gbps Capture 3.925 Gbps 2.774 Mpps] Output[0.322 Gbps] TotalLine:1404685 44908 Line/Sec 
Total Time: 30.14 sec RawInput[Wire 26.170 Gbps Capture 4.073 Gbps 2.879 Mpps] Output[0.334 Gbps] TotalLine:1405947 46640 Line/Sec  
fmadio commented 5 years ago

added --output-filterpath to allow testing different configurations.

nanji-fmad commented 5 years ago

I ran the test again with the latest code...

With keepalive | With filter_output | 7:29 min With keepalive | Without filter_output | 8:10 min Without keepalive | With filter_output | 8:40 min Without keepalive | Without filter_output | 14:20 min

fmadio commented 4 years ago

updated and working, closing the issue