Open craigsapp opened 9 months ago
Searching for the last problem character:
grep -rl "" *-*
Results in identifying this file:
pl-wumfc/kern/pl-wumfc--n2100--001-012_dobrzynski-ignacy-feliks--monbar-czyli-flibustierowie-partytura-fortepianowa-akt-ii-11-coro.krn
Editing the file(s):
vi $(grep -rl "" *-*)
Problem character in score, searching for character
:
Looking at online score:
https://polishscores.org/?id=pl-wumfc--n2100--001-012
Loading into VHV:
https://verovio.humdrum.org/?file=https%3A%2F%2Fhumdrum.nifc.pl%2Fpl-wumfc--n2100--001-012.krn
Original score:
https://polish.musicsources.pl/en/lokalizacje/galeria/druki-muzyczne/11863/99#nomenu
This particular character should be encoded as [segno]
which will be replaced with the segno sign.
Although text-encoded segnos should be made smaller:
Perhaps the *
in this case should have a similar text code invented, such as [starsegno]
.
(NB: updates should be made to gitea and not this repository, which is downstream of gitea).
Here are unicode characters in **text
spines. The listUnicodeCharacters
script is updated to add the -t
option to only look at lyrics (**text
). Run in the base directory of humdrum-polish-scores
:
make unicode-text
Results:
count UTF-32 encoded as glyph name
68543 00017F C5 BF ſ LATIN SMALL LETTER LONG S
20829 000142 C5 82 ł LATIN SMALL LETTER L WITH STROKE
14866 000119 C4 99 ę LATIN SMALL LETTER E WITH OGONEK
9956 00015B C5 9B ś LATIN SMALL LETTER S WITH ACUTE
9939 000105 C4 85 ą LATIN SMALL LETTER A WITH OGONEK
7747 00017C C5 BC ż LATIN SMALL LETTER Z WITH DOT ABOVE
7598 0000A0 C2 A0 NO-BREAK SPACE
7066 0000F3 C3 B3 ó LATIN SMALL LETTER O WITH ACUTE
6077 000107 C4 87 ć LATIN SMALL LETTER C WITH ACUTE
3313 0000E1 C3 A1 á LATIN SMALL LETTER A WITH ACUTE
2987 000144 C5 84 ń LATIN SMALL LETTER N WITH ACUTE
2579 0000E9 C3 A9 é LATIN SMALL LETTER E WITH ACUTE
1796 0000FF C3 BF ÿ LATIN SMALL LETTER Y WITH DIAERESIS
1575 000292 CA 92 ʒ LATIN SMALL LETTER EZH
1371 000283 CA 83 ʃ LATIN SMALL LETTER ESH
1337 0004E1 D3 A1 ӡ CYRILLIC SMALL LETTER ABKHASIAN DZE
1262 0000FA C3 BA ú LATIN SMALL LETTER U WITH ACUTE
1193 00017A C5 BA ź LATIN SMALL LETTER Z WITH ACUTE
1143 0000E0 C3 A0 à LATIN SMALL LETTER A WITH GRAVE
1126 0000E8 C3 A8 è LATIN SMALL LETTER E WITH GRAVE
678 0000FC C3 BC ü LATIN SMALL LETTER U WITH DIAERESIS
666 000165 C5 A5 ť LATIN SMALL LETTER T WITH CARON
635 002019 E2 80 99 ’ RIGHT SINGLE QUOTATION MARK
590 000161 C5 A1 š LATIN SMALL LETTER S WITH CARON
497 001E8F E1 BA 8F ẏ LATIN SMALL LETTER Y WITH DOT ABOVE
484 0000F2 C3 B2 ò LATIN SMALL LETTER O WITH GRAVE
410 0000F9 C3 B9 ù LATIN SMALL LETTER U WITH GRAVE
382 0000E4 C3 A4 ä LATIN SMALL LETTER A WITH DIAERESIS
379 00016D C5 AD ŭ LATIN SMALL LETTER U WITH BREVE
357 0000F6 C3 B6 ö LATIN SMALL LETTER O WITH DIAERESIS
292 00017B C5 BB Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE
291 001D23 E1 B4 A3 ᴣ LATIN LETTER SMALL CAPITAL EZH
277 00015A C5 9A Ś LATIN CAPITAL LETTER S WITH ACUTE
275 0000E2 C3 A2 â LATIN SMALL LETTER A WITH CIRCUMFLEX
268 0000E6 C3 A6 æ LATIN SMALL LETTER AE
240 0000DF C3 9F ß LATIN SMALL LETTER SHARP S
239 000307 CC 87 ̇ COMBINING DOT ABOVE
230 0000EA C3 AA ê LATIN SMALL LETTER E WITH CIRCUMFLEX
201 0000F4 C3 B4 ô LATIN SMALL LETTER O WITH CIRCUMFLEX
156 0000E3 C3 A3 ã LATIN SMALL LETTER A WITH TILDE
150 0000F5 C3 B5 õ LATIN SMALL LETTER O WITH TILDE
150 0004F1 D3 B1 ӱ CYRILLIC SMALL LETTER U WITH DIAERESIS
129 000141 C5 81 Ł LATIN CAPITAL LETTER L WITH STROKE
127 0001D4 C7 94 ǔ LATIN SMALL LETTER U WITH CARON
120 000301 CC 81 ́ COMBINING ACUTE ACCENT
105 0000EC C3 AC ì LATIN SMALL LETTER I WITH GRAVE
102 001EBD E1 BA BD ẽ LATIN SMALL LETTER E WITH TILDE
94 000115 C4 95 ĕ LATIN SMALL LETTER E WITH BREVE
85 000188 C6 88 ƈ LATIN SMALL LETTER C WITH HOOK
75 000169 C5 A9 ũ LATIN SMALL LETTER U WITH TILDE
74 00010D C4 8D č LATIN SMALL LETTER C WITH CARON
74 000233 C8 B3 ȳ LATIN SMALL LETTER Y WITH MACRON
68 0001B7 C6 B7 Ʒ LATIN CAPITAL LETTER EZH
61 00017E C5 BE ž LATIN SMALL LETTER Z WITH CARON
56 00201E E2 80 9E „ DOUBLE LOW-9 QUOTATION MARK
52 0004E0 D3 A0 Ӡ CYRILLIC CAPITAL LETTER ABKHASIAN DZE
48 0000FB C3 BB û LATIN SMALL LETTER U WITH CIRCUMFLEX
47 0000AD C2 AD SOFT HYPHEN
44 0000D4 C3 94 Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX
41 0000EB C3 AB ë LATIN SMALL LETTER E WITH DIAERESIS
40 00044A D1 8A ъ CYRILLIC SMALL LETTER HARD SIGN
40 000153 C5 93 œ LATIN SMALL LIGATURE OE
38 00014F C5 8F ŏ LATIN SMALL LETTER O WITH BREVE
30 00201D E2 80 9D ” RIGHT DOUBLE QUOTATION MARK
28 000442 D1 82 т CYRILLIC SMALL LETTER TE
22 00044E D1 8E ю CYRILLIC SMALL LETTER YU
22 000117 C4 97 ė LATIN SMALL LETTER E WITH DOT ABOVE
22 001E83 E1 BA 83 ẃ LATIN SMALL LETTER W WITH ACUTE
22 0000EE C3 AE î LATIN SMALL LETTER I WITH CIRCUMFLEX
22 00044B D1 8B ы CYRILLIC SMALL LETTER YERU
21 000178 C5 B8 Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS
21 002013 E2 80 93 – EN DASH
20 0001EF C7 AF ǯ LATIN SMALL LETTER EZH WITH CARON
20 000293 CA 93 ʓ LATIN SMALL LETTER EZH WITH CURL
18 000440 D1 80 р CYRILLIC SMALL LETTER ER
17 000151 C5 91 ő LATIN SMALL LETTER O WITH DOUBLE ACUTE
17 0001D0 C7 90 ǐ LATIN SMALL LETTER I WITH CARON
16 00A729 EA 9C A9 ꜩ LATIN SMALL LETTER TZ
16 000101 C4 81 ā LATIN SMALL LETTER A WITH MACRON
15 0001B5 C6 B5 Ƶ LATIN CAPITAL LETTER Z WITH STROKE
15 001E79 E1 B9 B9 ṹ LATIN SMALL LETTER U WITH TILDE AND ACUTE
15 000431 D0 B1 б CYRILLIC SMALL LETTER BE
14 00043C D0 BC м CYRILLIC SMALL LETTER EM
12 00043B D0 BB л CYRILLIC SMALL LETTER EL
12 000304 CC 84 ̄ COMBINING MACRON
12 000106 C4 86 Ć LATIN CAPITAL LETTER C WITH ACUTE
12 000187 C6 87 Ƈ LATIN CAPITAL LETTER C WITH HOOK
11 000438 D0 B8 и CYRILLIC SMALL LETTER I
11 000103 C4 83 ă LATIN SMALL LETTER A WITH BREVE
11 0000D3 C3 93 Ó LATIN CAPITAL LETTER O WITH ACUTE
11 000113 C4 93 ē LATIN SMALL LETTER E WITH MACRON
11 00043D D0 BD н CYRILLIC SMALL LETTER EN
10 0000FD C3 BD ý LATIN SMALL LETTER Y WITH ACUTE
10 002026 E2 80 A6 … HORIZONTAL ELLIPSIS
9 0000E7 C3 A7 ç LATIN SMALL LETTER C WITH CEDILLA
9 000227 C8 A7 ȧ LATIN SMALL LETTER A WITH DOT ABOVE
9 0000C4 C3 84 Ä LATIN CAPITAL LETTER A WITH DIAERESIS
8 000434 D0 B4 д CYRILLIC SMALL LETTER DE
8 0001CE C7 8E ǎ LATIN SMALL LETTER A WITH CARON
7 0000C8 C3 88 È LATIN CAPITAL LETTER E WITH GRAVE
7 00014D C5 8D ō LATIN SMALL LETTER O WITH MACRON
6 000432 D0 B2 в CYRILLIC SMALL LETTER VE
6 002014 E2 80 94 — EM DASH
5 000437 D0 B7 з CYRILLIC SMALL LETTER ZE
5 00043F D0 BF п CYRILLIC SMALL LETTER PE
5 0000ED C3 AD í LATIN SMALL LETTER I WITH ACUTE
5 0000F1 C3 B1 ñ LATIN SMALL LETTER N WITH TILDE
5 001EE7 E1 BB A7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE
5 000179 C5 B9 Ź LATIN CAPITAL LETTER Z WITH ACUTE
4 0000DC C3 9C Ü LATIN CAPITAL LETTER U WITH DIAERESIS
4 00AB33 EA AC B3 ꬳ Character in undefined range
4 0000EF C3 AF ï LATIN SMALL LETTER I WITH DIAERESIS
4 000267 C9 A7 ɧ LATIN SMALL LETTER HENG WITH HOOK
4 0001B6 C6 B6 ƶ LATIN SMALL LETTER Z WITH STROKE
4 0004E7 D3 A7 ӧ CYRILLIC SMALL LETTER O WITH DIAERESIS
4 00FEFF EF BB BF ZERO WIDTH NO-BREAK SPACE
3 0000B7 C2 B7 · MIDDLE DOT
3 00044F D1 8F я CYRILLIC SMALL LETTER YA
3 000463 D1 A3 ѣ CYRILLIC SMALL LETTER YAT
3 000203 C8 83 ȃ LATIN SMALL LETTER A WITH INVERTED BREVE
3 00016F C5 AF ů LATIN SMALL LETTER U WITH RING ABOVE
3 0003B1 CE B1 α GREEK SMALL LETTER ALPHA
2 0001F0 C7 B0 ǰ LATIN SMALL LETTER J WITH CARON
2 000130 C4 B0 İ LATIN CAPITAL LETTER I WITH DOT ABOVE
2 000114 C4 94 Ĕ LATIN CAPITAL LETTER E WITH BREVE
2 00AB4D EA AD 8D ꭍ Character in undefined range
2 001E3F E1 B8 BF ḿ LATIN SMALL LETTER M WITH ACUTE
2 00016B C5 AB ū LATIN SMALL LETTER U WITH MACRON
2 000443 D1 83 у CYRILLIC SMALL LETTER U
2 00043A D0 BA к CYRILLIC SMALL LETTER KA
2 001F77 E1 BD B7 ί GREEK SMALL LETTER IOTA WITH OXIA
2 0003BD CE BD ν GREEK SMALL LETTER NU
2 000433 D0 B3 г CYRILLIC SMALL LETTER GHE
1 00045D D1 9D ѝ CYRILLIC SMALL LETTER I WITH GRAVE
1 000436 D0 B6 ж CYRILLIC SMALL LETTER ZHE
1 0025D8 E2 97 98 ◘ INVERSE BULLET
1 00022F C8 AF ȯ LATIN SMALL LETTER O WITH DOT ABOVE
1 000173 C5 B3 ų LATIN SMALL LETTER U WITH OGONEK
1 0000C0 C3 80 À LATIN CAPITAL LETTER A WITH GRAVE
1 000109 C4 89 ĉ LATIN SMALL LETTER C WITH CIRCUMFLEX
1 000190 C6 90 Ɛ LATIN CAPITAL LETTER OPEN E
1 0000C9 C3 89 É LATIN CAPITAL LETTER E WITH ACUTE
1 001EF3 E1 BB B3 ỳ LATIN SMALL LETTER Y WITH GRAVE
1 0002BC CA BC ʼ MODIFIER LETTER APOSTROPHE
1 0000C2 C3 82 Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
1 00010B C4 8B ċ LATIN SMALL LETTER C WITH DOT ABOVE
1 001E9E E1 BA 9E ẞ LATIN CAPITAL LETTER SHARP S
It is better not to use characters like this:
120 000301 CC 81 ́ COMBINING ACUTE ACCENT
Since that might cause problems when searching the text (single character with acute accent included is better).
Related to #261
Here is a PERL script using
uniname
command-line program to list the number of each non-ASCII character in all POPC2 files (both lyrics and elsewhere):Installed in https://github.com/pl-wnifc/humdrum-polish-scores/blob/main/bin/listUnicodeCharacters (could be copied into the gitea repository as well)
Run with the command
make unicode
The results:
The "Unknown character in range Private Use Area" and "character in undefined range" characters should be removed from the files.