FoxIO-LLC / ja4

JA4+ is a suite of network fingerprinting standards
https://foxio.io
Other
903 stars 78 forks source link

JA4H.md documentation contains inconsistent details and problematic delimiter #80

Open eugeneturk opened 7 months ago

eugeneturk commented 7 months ago

In the technical_details document describing the JA4H specification, there are minor inconsistencies and a problematic delimiter. This document was previously found at technical_details/JA4H.md and subsequently removed in this commit:

https://github.com/FoxIO-LLC/ja4/commit/b6f3ff4c779d05da92e7263b2e5ab7287a2245ac#diff-aeca2ef7c4beaff2ccd0f42a618a6c85d23ba0e625fa735fda15332bf4d629c6

Issue 1, line 54: "99 = anything > than 100 headers". This should likely be either "99 = anything > than 99 headers" or "99 = anything >= than 100 headers".

Issue 2, lines 138-180, under "## JA4H Example:". The count of headers in the first section is 13, including Cookie and Referer, therefore the first section of the JA4H should be: ge20cr11enus not ge20cr13enus as it is on line 179. The hash of the sorted, delimited cookie name fields for the third section is listed as b66fa821d02c. It should be 0f2659b474bf. The incorrect value appears on lines 175 and 179. The hash of the sorted, delimited cookie names+values fields for the fourth section is listed as e97928733c74. It should be 161698816dab. The incorrect value appears on lines 177 and 179.

Here is a script showing the calculation of correct values using the strings copied directly from the technical document:

#!/usr/bin/perl

use 5.16.0;
use warnings;

use Digest::SHA qw(sha256_hex);

my $headers = 'Host,Sec-Ch-Ua,Sec-Ch-Ua-Mobile,User-Agent,Sec-Ch-Ua-Platform,Accept,Sec-Fetch-Site,Sec-Fetch-Mode,Sec-Fetch-Dest,Accept-Encoding,Accept-Language';
my $cookies = 'FastAB,_dd_s,countryCode,geoData,sato,stateCode,umto,usprivacy';
my $cookies_values = 'FastAB=0=6859,1=8174,2=4183,3=3319,4=3917,5=2557,6=4259,7=6070,8=0804,9=6453,10=1942,11=4435,12=4143,13=9445,14=6957,15=8682,16=1885,17=1825,18=3760,19=0929,_dd_s=logs=1&id=b5c2d770-eaba-4847-8202-390c4552ff9a&created=1686159462724&expire=1686160422726,countryCode=US,geoData=purcellville|VA|20132|US|NA|-400|broadband|39.160|-77.700|511,sato=1,stateCode=VA,umto=1,usprivacy=1---';

my $headers_hash = substr(sha256_hex($headers), 0, 12);
my $cookies_hash = substr(sha256_hex($cookies), 0, 12);
my $cv_hash = substr(sha256_hex($cookies_values), 0, 12);

say "headers [$headers_hash]";
say "cookies [$cookies_hash]";
say "cookies_values [$cv_hash]";
headers [974ebe531c03]
cookies [0f2659b474bf]
cookies_values [161698816dab]

Issue 3: the delimiter of the JA4H sections is '_', but underscore characters may appear in header names, cookie names, and cookie values. While the sha256 hash of the original data for sections 2,3, and 4 will never contain an underscore, both the raw and original versions of sections 2,3, and 4 may contain underscores. This complicates parsing the different sections from the JA4H_r and JA4H_o strings. If that's not a design consideration for these versions of the JA4H, then it shouldn't be an issue, but if it's useful to extract the original sections from the JA4H_r and JA4H_o, then it may be helpful to consider a different delimiter or perhaps escaping the delimiter if it may appear in the relevant sections.