Closed unixfox closed 1 year ago
Thanks for the question unixfox!
Please see the technical details behind JA4H here: https://github.com/FoxIO-LLC/ja4/blob/main/technical_details/JA4H.md There you will see that "JA4H captures all HTTP header fields, case-sensitive" so the JA4H_b, the hash of the HTTP headers observed, is case-sensitive. JA4H does not normalize the HTTP content to any case. If you see that happening, it's a bug, please let us know so we can fix it!
That said, JA4H does not clearly call out if the headers are all lower-case or not, it just provides a hash. To see clearly if the headers are camel case, you would want to look at the JA4H_r (the raw fingerprint, unhashed). Does that answer your question?
Thank you for your reply!
Indeed, I think that is an issue with the python implementation: https://github.com/FoxIO-LLC/ja4/issues/18
Sorry for not spotting earlier that the rust version worked fine.
Very nice job in engineering the JA4H method.
While reading the paper, I was surprised to see that you do not take into account whenever the HTTP client sent each header with an uppercase for "each" word or not.
Accept-Encoding
VSaccept-encoding
I know by default it's rare that this second header in lowercase is sent by the major HTTP client libraries and browsers. But there are some examples where an HTTP header could be sent in lowercase instead of the uppercase:
or even worse (mistakes can happen):
Same for the value of each HTTP header, while testing the python program I got the same hash for all of these 3 different code:
Would it be interesting to add to your method? Or is it even feasible?