Open leafcutterant opened 6 years ago
Hi @leafcutterant, good question!
The hashing operations all treat the input as UTF-8. The following recipe demonstrates this. Try disabling and enabling the 'Encode text' operation and you'll see that the hash output doesn't change.
The UTF BOM is not included.
UTF-8 is used by default for all CyberChef operations. There may be some edge cases where it is deliberately not used, but there are normally good reasons for that. Certainly for the hashing and encryption operations, URF-8 should be assumed.
@n1474335, thanks for the answer!
You gave me an idea and I made some tests, and I'm not sure it's UTF-8.
For a simplistic reference, I took the lowercase letter á
(a-acute).
https://gchq.github.io/CyberChef/#recipe=Encode_text(%27UTF-8%20(65001)%27)MD5()&input=4Q
Encoding it to UTF-8 gives a different hash.
Also, encoding it to hex with CyberChef gives e1
, which, according to Wikipedia, is the representation in Unicode, NCR and ISO 8859-1/2/3/4/9/10/14/15/16.
On the other hand, encoding the (right-to-left) first letter of your text (ا
, Arabic alif) to hex gives d8 a7
, which is the UTF-8 representation of the letter.
اá
(Arabic alif + a-acute) gives d8 a7 c3 a1
. The last two octets c3 a1
are UTF-8 for á
, so it seems the default encoding is either Unicode / NCR / ISO 8859-1/2/3/4/9/10/14/15/16, and it changes to UTF-8 when there is a character falling outside of the default's encoding space.
I experienced the same with わá
(Japanese hiragana wa + a-acute).
Could you confirm this?
Hey @n1474335, I believe this is an important aspect of Cyberchef as a tool. Did you happen to have the time to investigate this behavior?
Any progress or update on this?
This is not really an issue, so apologies in advance.
When I enter text manually into the Input field and all I have is a hashing operation in the Recipe, what text encoding is used to interpret the text? And is byte order mark used or not?
Also, is the encoding the same with other operations as well?