Sefaria / Sefaria-Export

Structured Jewish texts and metadata exported from Sefaria's database.
Other
245 stars 161 forks source link

Texts not using appropriate Unicode punctuation #28

Open pseudomonas opened 2 years ago

pseudomonas commented 2 years ago

Some Sefaria texts are using similar-shaped glyphs when more semantically-appropriate glyphs exist in the unicode standard:

e.g. U+0022 QUOTATION MARK in place of U+05F4 HEBREW PUNCTUATION GERSHAYIM U+0027 APOSTROPHE in place of U+05F3 HEBREW PUNCTUATION GERESH U+003A COLON in place of U+05C3 HEBREW PUNCTUATION SOF PASUQ

see as an example https://raw.githubusercontent.com/Sefaria/Sefaria-Export/master/cltk-flat/Midrash/Aggadic%20Midrash/Midrash%20Tehillim/Hebrew/merged.json