cisco / joy

A package for capturing and analyzing network flow data and intraflow data, for network research, forensics, and security monitoring.
Other
1.31k stars 329 forks source link

Question about parsing TLS data #229

Closed Applenice closed 5 years ago

Applenice commented 5 years ago

Hello!I am having problems parsing TLS data,I don't know where the problem is. Use version:4.0.0 Operating system:CentOS Linux release 7.5.1804 (Core)、Ubuntu16.04 Configuration file:

output=output/gz
bidir=1
dist=1
classify=1
tls=1
entropy=1
verbosity=1
logfile=output/log/20190129_01.log

I removed the pcap_setfilter() part of the function process_pcap_file() to avoid no VLAN support for data link type.

Use command:

bin/joy -x output/option_config.txt ../DATA/PCAP/

Some data parsing errors were found when viewing the parsing results:

"bytes_out":0,
"packets":[],
"byte_dist":[0,0,0.....],
or
"tls":{"error":"no role"},

However, when I execute a command to process this PCAP file, TLS parsing is normal. Command at this time:

bin/joy -x output/option_config.txt ../DATA/PCAP/9956.pcap > output/gz/9956.gz

I have reproduced this question many times. The HTTPS data size in the PCAP folder is 6G. I don't know why this problem occurs.Is there any solution?

Looking forward to reply, thank you.

bhudson33 commented 5 years ago

"tls":{"error":"no role"}, -> means that the code could not determine if the flow was a client or server flow. Therefore some of the TLS will not be output because it was not collected properly. Could you send along the config file and pcap file you are using? Also, please update to the latest software as there were some bug fixes around parsing options being turned on recently.

Applenice commented 5 years ago

I share the PCAP file and configuration file here: https://drive.google.com/drive/folders/11V-eaHVeetmsxscCkPay6DJ-hHNFAFNm

When I use the configuration config.txt, the input is the PCAP directory,use command:

bin/joy -x output/config.txt ../DATA/PCAP/

I view the file through the zcat command,there will be many errors, most TLS data parsing results appear: "tls": {"error": "no role"}

When I use the configuration config_single.txt, the input is a pcap file, use command:

bin/joy -x output/config_single.txt ../DATA/PCAP/douban.pcap > output/gz/douban_single.gz

TLS data parsing is normal, no error occurs.

I will upgrade the version to see if this issue still exists.

Applenice commented 5 years ago

After upgrading to the latest version, I found that the problem still exists. I seem to have found the reason for this problem.

In the function process_pcap_file(): https://github.com/cisco/joy/blob/79d925ef605a00de9cf48645ec2e3e9ac8f919a8/src/joy.c#L1726

https://github.com/cisco/joy/blob/79d925ef605a00de9cf48645ec2e3e9ac8f919a8/src/joy.c#L111

A value of NUM_PACKETS_IN_LOOP for cnt,I changed the value of cnt to -1,use the configuration config.txt, the input is the PCAP directory,TLS data parsing is normal.

bhudson33 commented 5 years ago

Why would you change the value of NUM_PACKETS_IN_LOOP to -1?

Applenice commented 5 years ago

I saw the relevant content in the documentation: http://www.tcpdump.org/manpages/pcap.3pcap.html https://www.tcpdump.org/manpages/pcap_loop.3pcap.html

A value of -1 or 0 for cnt causes all the packets received in one buffer to be processed when reading a live capture, and causes all the packets in the file to be processed when reading a ``savefile''.

bhudson33 commented 5 years ago

So it is unnecessary to modify that parameter with joy. We set the packet loop to 5, but we continue to loop over packets until you stop the program (live) or the pcap file is exhausted (pcap). The pcap value of 5 allows joy to break out of libpcap and do some data analysis on the packets that have been processed (expired flows, classification, etc). So with joy, you do not need to modify that parameter.

Applenice commented 5 years ago

I understand what you mean. But, as I mentioned before, parsing the PCAP directory and parsing a single file can produce different results.

Before I change the NUM_PACKETS_IN_LOOP value, I parse all PCAP files in the directory by parsing a single pcap file, and then parse the PCAP directory. I use the grep command to calculate the number of three keys(error, c_version, s_version) in the json file generated by the two methods . The result is very different, this is where I am confused.I think the results of the two methods should be similar or consistent. After change the NUM_PACKETS_IN_LOOP value, verify this idea by comparing again.

Can you reproduce my situation based on the data and configuration files I provided?

bhudson33 commented 5 years ago

We did find a bug in processing a directory of files versus a single file. A fix for that will go in shortly.