FoxIO-LLC / ja4

JA4+ is a suite of network fingerprinting standards
https://foxio.io
Other
903 stars 78 forks source link

rust ja4h not sorting cookies right #58

Closed awick closed 7 months ago

awick commented 8 months ago

wireshark & arkime agree for 7th session of https://github.com/arkime/arkime/raw/main/tests/pcap/single-packets.pcap

rust:

arkime: "ja4h":["ge11cr06enus_8c2f9ef95269_d23bf79698dc_69e42fa741fe"] "ja4h_r":["ge11cr06enus_Accept,Accept-Language,User-Agent,Accept-Encoding,Host,Connection_pardot,visitor_id413862,visitor_id413862-hash_pardot=tee2foreb3fefpgvk8u1056vt3,visitor_id413862=286585660,visitor_id413862-hash=1f00bdb076b5fb707c70254849819ec1797d3e27cef91a61a9488cb7ca0ebf77f226caa4075591b2591bf9a1ccdf29432c67379b"]

wireshark: JA4H: ge11cr06enus_8c2f9ef95269_d23bf79698dc_69e42fa741fe JA4H Raw [truncated]: ge11cr06enus_Accept,Accept-Language,User-Agent,Accept-Encoding,Host,Connection_pardot,visitor_id413862,visitor_id413862-hash_pardot=tee2foreb3fefpgvk8u1056vt3,visitor_id413862=286585660,visitor_id413862-hash=1f00bdb07

vvv commented 7 months ago

@awick Thanks for reporting this! 🙏🏻

  1. Indeed, the Rust app used to generate the JA4H_c chunk incorrectly. Not any more; see #69.
  2. The JA4H_d chunk produced by Wireshark and Python is wrong. The JA4H for TCP stream 7 of single-packets.pcap should end with d23bf79698dc_c1eaa758c543.

JA4H calculation

cookie-string:

❯ cookie-string() { tshark -J http -r pcap/single-packets.pcap -T fields -e http.cookie 'tcp.stream == 7'; }

❯ cookie-string
visitor_id413862=286585660; visitor_id413862-hash=1f00bdb076b5fb707c70254849819ec1797d3e27cef91a61a9488cb7ca0ebf77f226caa4075591b2591bf9a1ccdf29432c67379b; pardot=tee2foreb3fefpgvk8u1056vt3

cookie-pairs:

❯ cookie-pairs() { cookie-string | sed 's/; /\n/g'; }

❯ cookie-pairs
visitor_id413862=286585660
visitor_id413862-hash=1f00bdb076b5fb707c70254849819ec1797d3e27cef91a61a9488cb7ca0ebf77f226caa4075591b2591bf9a1ccdf29432c67379b
pardot=tee2foreb3fefpgvk8u1056vt3

cookie-names:

❯ cookie-names() { cookie-pairs | cut -d= -f1; }

❯ cookie-names
visitor_id413862
visitor_id413862-hash
pardot

Comma-separated list of sorted cookie-names:

❯ cookie-names | LC_COLLATE=C sort | paste -s -d, -
pardot,visitor_id413862,visitor_id413862-hash

Helper function:

❯ comma_join() {
    # HACK: `tr -d \\n` removes the newline character
    # that `paste` appends to its output
    paste -s -d, - |
    tr -d \\n
}

JA4H_c:

❯ JA4H_c() { cookie-names | LC_COLLATE=C sort | comma_join | sha256sum | head -c 12; echo; }

❯ JA4H_c
d23bf79698dc

Comma-separated list of sorted cookie-pairs:

❯ cookie-pairs | LC_COLLATE=C sort | paste -s -d, -
pardot=tee2foreb3fefpgvk8u1056vt3,visitor_id413862-hash=1f00bdb076b5fb707c70254849819ec1797d3e27cef91a61a9488cb7ca0ebf77f226caa4075591b2591bf9a1ccdf29432c67379b,visitor_id413862=286585660

JA4H_d:

❯ JA4H_d() { cookie-pairs | LC_COLLATE=C sort | comma_join | sha256sum | head -c12; echo; }

❯ JA4H_d
c1eaa758c543
awick commented 7 months ago

So we should double check with John, but I think _d is RIGHT with wireshark and arkime, because for d you should use the same order of cookies as with c. You can NOT sort the cookie pairs directly, you have to walk thru the _c list and form the _d list because = comes before - when sorting. That particular capture is a good test case.

So everyone agrees C is: pardot visitor_id413862 visitor_id413862-hash

The means D should be pardot=tee2foreb3fefpgvk8u1056vt3 visitor_id413862=286585660 visitor_id413862-hash=1f00bdb076b5fb707c70254849819ec1797d3e27cef91a61a9488cb7ca0ebf77f226caa4075591b2591bf9a1ccdf29432c67379b

NOT pardot=tee2foreb3fefpgvk8u1056vt3 visitor_id413862-hash=1f00bdb076b5fb707c70254849819ec1797d3e27cef91a61a9488cb7ca0ebf77f226caa4075591b2591bf9a1ccdf29432c67379b visitor_id413862=286585660