cisco / mercury

Mercury: network metadata capture and analysis
Other
430 stars 75 forks source link

Extracting payload hash from network traffic #12

Open arunppsg opened 3 years ago

arunppsg commented 3 years ago

I was wondering whether it would be possible to extract payload or the payload hash from network traffic along with the fingerprints using mercury. Are there any options for it? We can do it with tcpdump but it does not give fingerprints. Any pointers will be helpful. Thanks.

davidmcgrew commented 3 years ago

Hi Arun, we did experiment with hashing of the TCP or UDP Data field of a packet as a way to detect retransmissions and duplicated packets. In some branch or another, I think there is is code to print out the data field as a hex number. Is this the sort of thing you had in mind? Thx!

arunppsg commented 3 years ago

Exactly, that was what I was looking for. If I could get the data field, then I could compute hash of it - in my case, a sha256 hash of the payload will suffix.

davidmcgrew commented 3 years ago

Since there is no need for cryptographic collision resistance, and there is a need for speed, I had used the xxhash library https://github.com/Cyan4973/xxHash. It performed quite well in tests. I can't find the code that I had experimented with; I think it was never committed into the git repo. It added a new JSON element that holds the xxhash of the entire TCP data field of packet, something like this:

{"tcp":{"data_hash":"474554202f20485454502"}, "src_ip":"192.168.113.237", "dst_ip":"35.224.99.156", "protocol":6, "src_port":53560, "dst_port":80, "event_start":1565200503.658237}

The hash provides a practical way to detect duplicated packets, which seem to happen all the time in network capture environments, by detecting duplicate data_hash values in whatever JSON processing is being done. I think the data_hash output could be a useful aid in debugging network capture systems, especially ones with multiple capture interfaces. However, what I'd personally find more useful would be a mercury option that detected duplicate packets and ignored them (by only processing and reporting on the first packet, and ignoring any following ones). Does that line up with your thinking, or do you have some other use cases in mind?

Thanks!

arunppsg commented 3 years ago

Yes, that is my requirement - to detect duplicate packet based on the payload hash value. One reason for using mercury is that it is able to handle high amount of traffic. Is there any way I could help or contribute to integrate that feature in mercury?

Thanks!

davidmcgrew commented 3 years ago

Thanks for the offer to help. I have a bunch of other changes in progress. After those are done, how about I add a hash-based deduplicator as a compile-time option, and you can build it with that option and test it out in your environment.

arunppsg commented 3 years ago

Sure, that will be great. Thanks for your help. In the meantime, I will also work on it.