Phishing attacks / text manipulation to decieve reader of a text to think something is original when it is not. The consequences of not detecting these could mean non-searchable texts, parallelax errors and many more. As more and more functions will get digitized and represented in systems, it is important to have tooling to detect these by a tool - which can be plugged in by developers.
Solution
A python / nodejs library which can provide a Phishing Detector / Hackability score to Tamil text that developers can consume is the requirement. I am providing a list of sample scenarios, these can be expanded well.
To be able to detect instances of
visual similarity based phishing in text online,
Detect unicode hacks / old typewriter layout hacks introducing non-searchable character
hackability of text offline
Text visual similarity / hackability score -
ச. சரவணன் -- க. கரவணன் -- Hackability score of former text to be converted to latter, especially in written text.
Given any text, it must check for standalone glyphs entered through typewriter layouts, ZWNJ etc.
Related Glyphs :-
Build a dictionary of related glyphs and a assign a similarility score that has a consistent technical basis. Use this is visual similarity / hackability score calculator of a given text(s).
க ச - ச can be made to க
ஜ ஐ - Visually too similar
ஔ(au) - ஒள(oLa) - Visually too similar
பப் ய் - Visually too similar, particularly in handwritten text
ல வ - Visually too similar, particularly in handwritten text
மு ழு - Visually too similar, particularly in handwritten text
கு சூ - Visually too similar, particularly in handwritten text
கர் கா - (Visually related in offline - where difference is only a pulli)
Problem Statement
Phishing attacks / text manipulation to decieve reader of a text to think something is original when it is not. The consequences of not detecting these could mean non-searchable texts, parallelax errors and many more. As more and more functions will get digitized and represented in systems, it is important to have tooling to detect these by a tool - which can be plugged in by developers.
Solution
To be able to detect instances of
Text visual similarity / hackability score -
Unicode typos -
Given any text, it must check for standalone glyphs entered through typewriter layouts, ZWNJ etc.
Related Glyphs :-
Build a dictionary of related glyphs and a assign a similarility score that has a consistent technical basis. Use this is visual similarity / hackability score calculator of a given text(s).