Problem Statement

Phishing attacks / text manipulation to decieve reader of a text to think something is original when it is not. The consequences of not detecting these could mean non-searchable texts, parallelax errors and many more. As more and more functions will get digitized and represented in systems, it is important to have tooling to detect these by a tool - which can be plugged in by developers.

Solution

A python / nodejs library which can provide a Phishing Detector / Hackability score to Tamil text that developers can consume is the requirement. I am providing a list of sample scenarios, these can be expanded well.

To be able to detect instances of

visual similarity based phishing in text online,
Detect unicode hacks / old typewriter layout hacks introducing non-searchable character
hackability of text offline

Text visual similarity / hackability score -

ச. சரவணன் -- க. கரவணன் -- Hackability score of former text to be converted to latter, especially in written text.
ஜெயராமன் - ஜிராமன் -- Visual Similarity score
பா இராஜேந்திரன் - சுபா இராஜேந்திரன் -- Visual Similarity score

Unicode typos -

Given any text, it must check for standalone glyphs entered through typewriter layouts, ZWNJ etc.

Related Glyphs :-

Build a dictionary of related glyphs and a assign a similarility score that has a consistent technical basis. Use this is visual similarity / hackability score calculator of a given text(s).

க ச - ச can be made to க
ஜ ஐ - Visually too similar
ஔ(au) - ஒள(oLa) - Visually too similar
பப் ய் - Visually too similar, particularly in handwritten text
ல வ - Visually too similar, particularly in handwritten text
மு ழு - Visually too similar, particularly in handwritten text
கு சூ - Visually too similar, particularly in handwritten text
கர் கா - (Visually related in offline - where difference is only a pulli)
ஸா லா - Visually related

KaniyamFoundation / ProjectIdeas

மின்-தூண்டிலிடல் கண்டறிதல் - Phishing Detector / Hackability score #151

Problem Statement

Solution

Text visual similarity / hackability score -

Unicode typos -

Related Glyphs :-