boinkor-net / chars

cha(rs) is a commandline tool to display information about unicode characters
https://github.com/boinkor-net/chars
MIT License
183 stars 13 forks source link

Difficulty searching for small triangles #84

Open Jayman2000 opened 2 years ago

Jayman2000 commented 2 years ago

I can see that there are several small triangles that exist:

$ chars 'DOWN-POINTING TRIANGLE'
U+0001F783, 🞃 0x0001F783, \0373603, UTF-8: f0 9f 9e 83, UTF-16BE: d83ddf83
Width: 1, prints as 🞃
Quotes as \u{1f783}
Unicode name: BLACK DOWN-POINTING ISOSCELES RIGHT TRIANGLE

U+0001F53D, 🔽 0x0001F53D, \0372475, UTF-8: f0 9f 94 bd, UTF-16BE: d83ddd3d
Width: 2, prints as 🔽
Quotes as \u{1f53d}
Unicode name: DOWN-POINTING SMALL RED TRIANGLE

U+0001F53B, 🔻 0x0001F53B, \0372473, UTF-8: f0 9f 94 bb, UTF-16BE: d83ddd3b
Width: 2, prints as 🔻
Quotes as \u{1f53b}
Unicode name: DOWN-POINTING RED TRIANGLE

U+2BC6, ⯆ 0x2BC6, \025706, UTF-8: e2 af 86, UTF-16BE: 2bc6
Width: 1, prints as ⯆
Quotes as \u{2bc6}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE CENTRED

U+29E9, ⧩ 0x29E9, \024751, UTF-8: e2 a7 a9, UTF-16BE: 29e9
Width: 1, prints as ⧩
Quotes as \u{29e9}
Unicode name: DOWN-POINTING TRIANGLE WITH RIGHT HALF BLACK

U+29E8, ⧨ 0x29E8, \024750, UTF-8: e2 a7 a8, UTF-16BE: 29e8
Width: 1, prints as ⧨
Quotes as \u{29e8}
Unicode name: DOWN-POINTING TRIANGLE WITH LEFT HALF BLACK

U+26DB, ⛛ 0x26DB, \023333, UTF-8: e2 9b 9b, UTF-16BE: 26db
Width: 1 (2 in CJK context), prints as ⛛
Quotes as \u{26db}
Unicode name: HEAVY WHITE DOWN-POINTING TRIANGLE

U+25BF, ▿ 0x25BF, \022677, UTF-8: e2 96 bf, UTF-16BE: 25bf
Width: 1, prints as ▿
Quotes as \u{25bf}
Unicode name: WHITE DOWN-POINTING SMALL TRIANGLE

U+25BE, ▾ 0x25BE, \022676, UTF-8: e2 96 be, UTF-16BE: 25be
Width: 1, prints as ▾
Quotes as \u{25be}
Unicode name: BLACK DOWN-POINTING SMALL TRIANGLE

U+25BD, ▽ 0x25BD, \022675, UTF-8: e2 96 bd, UTF-16BE: 25bd
Width: 1 (2 in CJK context), prints as ▽
Quotes as \u{25bd}
Unicode name: WHITE DOWN-POINTING TRIANGLE

U+25BC, ▼ 0x25BC, \022674, UTF-8: e2 96 bc, UTF-16BE: 25bc
Width: 1 (2 in CJK context), prints as ▼
Quotes as \u{25bc}
Unicode name: BLACK DOWN-POINTING TRIANGLE

U+23F7, ⏷ 0x23F7, \021767, UTF-8: e2 8f b7, UTF-16BE: 23f7
Width: 1, prints as ⏷
Quotes as \u{23f7}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE

U+23EC, ⏬ 0x23EC, \021754, UTF-8: e2 8f ac, UTF-16BE: 23ec
Width: 2, prints as ⏬
Quotes as \u{23ec}
Unicode name: BLACK DOWN-POINTING DOUBLE TRIANGLE

$ 

But, when I try to look at only the small triangles:

$ chars 'SMALL TRIANGLE'
$ 

I get nothing. If I search for medium triangles:

$ chars 'MEDIUM TRIANGLE'
U+0001F827, 🠧 0x0001F827, \0374047, UTF-8: f0 9f a0 a7, UTF-16BE: d83edc27
Width: 1, prints as 🠧
Quotes as \u{1f827}
Unicode name: DOWNWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F826, 🠦 0x0001F826, \0374046, UTF-8: f0 9f a0 a6, UTF-16BE: d83edc26
Width: 1, prints as 🠦
Quotes as \u{1f826}
Unicode name: RIGHTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F825, 🠥 0x0001F825, \0374045, UTF-8: f0 9f a0 a5, UTF-16BE: d83edc25
Width: 1, prints as 🠥
Quotes as \u{1f825}
Unicode name: UPWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F824, 🠤 0x0001F824, \0374044, UTF-8: f0 9f a0 a4, UTF-16BE: d83edc24
Width: 1, prints as 🠤
Quotes as \u{1f824}
Unicode name: LEFTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT

U+0001F807, 🠇 0x0001F807, \0374007, UTF-8: f0 9f a0 87, UTF-16BE: d83edc07
Width: 1, prints as 🠇
Quotes as \u{1f807}
Unicode name: DOWNWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+0001F806, 🠆 0x0001F806, \0374006, UTF-8: f0 9f a0 86, UTF-16BE: d83edc06
Width: 1, prints as 🠆
Quotes as \u{1f806}
Unicode name: RIGHTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+0001F805, 🠅 0x0001F805, \0374005, UTF-8: f0 9f a0 85, UTF-16BE: d83edc05
Width: 1, prints as 🠅
Quotes as \u{1f805}
Unicode name: UPWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+0001F804, 🠄 0x0001F804, \0374004, UTF-8: f0 9f a0 84, UTF-16BE: d83edc04
Width: 1, prints as 🠄
Quotes as \u{1f804}
Unicode name: LEFTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD

U+2BC8, ⯈ 0x2BC8, \025710, UTF-8: e2 af 88, UTF-16BE: 2bc8
Width: 1, prints as ⯈
Quotes as \u{2bc8}
Unicode name: BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED

U+2BC7, ⯇ 0x2BC7, \025707, UTF-8: e2 af 87, UTF-16BE: 2bc7
Width: 1, prints as ⯇
Quotes as \u{2bc7}
Unicode name: BLACK MEDIUM LEFT-POINTING TRIANGLE CENTRED

U+2BC6, ⯆ 0x2BC6, \025706, UTF-8: e2 af 86, UTF-16BE: 2bc6
Width: 1, prints as ⯆
Quotes as \u{2bc6}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE CENTRED

U+2BC5, ⯅ 0x2BC5, \025705, UTF-8: e2 af 85, UTF-16BE: 2bc5
Width: 1, prints as ⯅
Quotes as \u{2bc5}
Unicode name: BLACK MEDIUM UP-POINTING TRIANGLE CENTRED

U+23F7, ⏷ 0x23F7, \021767, UTF-8: e2 8f b7, UTF-16BE: 23f7
Width: 1, prints as ⏷
Quotes as \u{23f7}
Unicode name: BLACK MEDIUM DOWN-POINTING TRIANGLE

U+23F6, ⏶ 0x23F6, \021766, UTF-8: e2 8f b6, UTF-16BE: 23f6
Width: 1, prints as ⏶
Quotes as \u{23f6}
Unicode name: BLACK MEDIUM UP-POINTING TRIANGLE

U+23F5, ⏵ 0x23F5, \021765, UTF-8: e2 8f b5, UTF-16BE: 23f5
Width: 1, prints as ⏵
Quotes as \u{23f5}
Unicode name: BLACK MEDIUM RIGHT-POINTING TRIANGLE

U+23F4, ⏴ 0x23F4, \021764, UTF-8: e2 8f b4, UTF-16BE: 23f4
Width: 1, prints as ⏴
Quotes as \u{23f4}
Unicode name: BLACK MEDIUM LEFT-POINTING TRIANGLE

$ 

I still get plenty of results.

antifuchs commented 2 years ago

Huh, I suspect that the way we're querying the fst is wrong. Maybe there's a better way to make these queries, but I'm not sure at the moment.

Jayman2000 commented 2 years ago

Something similar is happening when I search for “SIGN”:

$ chars HORNS
U+0001F918, 🤘 0x0001F918, \0374430, UTF-8: f0 9f a4 98, UTF-16BE: d83edd18
Width: 2, prints as 🤘
Quotes as \u{1f918}
Unicode name: SIGN OF THE HORNS

U+0001F608, 😈 0x0001F608, \0373010, UTF-8: f0 9f 98 88, UTF-16BE: d83dde08
Width: 2, prints as 😈
Quotes as \u{1f608}
Unicode name: SMILING FACE WITH HORNS

$ chars SIGN
$