be5invis / Iosevka

Versatile typeface for code, from code.
http://be5invis.github.io/Iosevka
SIL Open Font License 1.1
19.04k stars 570 forks source link

Hebrew glyph support #690

Open yuvadm opened 3 years ago

yuvadm commented 3 years ago

There aren't many monospaced fonts that have proper Hebrew glyphs, and it seems rather without justification. (This is the only one that shows up on Google fonts for the monospace / Hebrew intersection)

Hebrew has only 22 letters, with 5 additional glyphs used as the "ending form" of letters at ends of words. Hebrew doesn't have any complicated typesetting as opposed to other RTL languages such as Arabic or Farsi.

It would be really nice for Iosevka to be able to support Hebrew characters.

I would also personally be interested in helping develop Hebrew support but would appreciate some assistance and guidance on the correct process for working on the Iosevka code base.

be5invis commented 3 years ago

Complex scripts (Hebrew is one of them) are for far future. I'll leave this open, and may investigate this alongside with Arabic.

be5invis commented 3 years ago

(ps. Pragmata Pro has Hebrew too, though it is commercial.)

yuvadm commented 3 years ago

@be5invis Can you briefly explain why you consider Hebrew a complex script? To the best of my understanding it doesn't have any complex typesetting rules. While it's true that Hebrew can optionally have diacritic marks which complicate things, the basic glyphs have no such requirement, and most day-to-day uses of Hebrew can get by just fine without them.

clsn commented 3 years ago

I have some experience in making Hebrew fonts; I might be able to help in a small way.

Basic Hebrew support is dead simple. Opposite-of-complex. 27 ordinary spacing letter glyphs, plus maybe three or four spacing punctuation marks (Modern Hebrew requires U+05F3 HEBREW PUNCTUATION GERESH and U+05F4 HEBREW PUNCTUATION GERSHAYIM, at least, though people often make do with ' and "), and you're done.

To do Hebrew well (but still not completely) requires handling a bunch of combining vowel-marks, but on the whole they're not really complicated as these things go. Some anchors to handle the finer points will do the job. The vowels are supposedly optional, but really, in practice, you find a vowel-mark thrown in once or twice(?) per page of text, even for grown-ups, for words that might otherwise be misread, so it's good if you can handle them, even crudely. Some of the letter+vowel combinations (all cases of Letter plus U+05BC HEBREW POINT DAGESH OR MAPIQ, combinations involving SHIN DOT and SIN DOT, and a few selected Letter plus vowel combinations that are important in Yiddish) have their own precomposed code-points in Unicode so they need to go into a ccmp table, but that's standard.

If you want to go all-out, you can support the cantillations, which are ONLY used in typesetting Biblical text, and even then not most of the time. The complicated thing with them is handling how they interact with the vowels when they are on the same letter, etc. Sane people don't generally bother with supporting the cantillations (naturally, this means that I sometimes do.)

Some small OpenType details can be added for extra-special excellence, but are rarely worth the bother, especially in a monospace font. Hebrew fonts by John Hudson (e.g. SBL Hebrew) are IMO examples of particularly fine Hebrew fonts.

TL;DR: Hebrew support is trivial and you shouldn't consider it complex or worry about adding it.

be5invis commented 3 years ago

@clsn If Hebrew is introduced, it will at least support vowels. This will be a V4.x or 5.x goal, perhaps slightly before Arabic?

be5invis commented 3 years ago

Also punctuations and symbols. @clsn, does Hebrew has the mirrored punctuations like Arabic? Could you please provide a list of them?

clsn commented 3 years ago

Generally, from what I've seen, Hebrew is printed (sometimes even handwritten) with ordinary Latin-style LTR-looking punctuations. So the question-mark is the ascii ? and it faces away from the question and not back towards it as it does in English, or the Arabic ؟‎ does. Even commas are normally written the same way we do in English. There might be some other styles, but really that's what I've mainly seen: plain old LTR general punctuation, maybe sometimes U+05C3 HEBREW PUNCTUATION SOF PASUQ if you're quoting a verse or something, and a U+05BE HEBREW PUNCTUATION MAQAF because it matches better than an ordinary hyphen. Modern Hebrew also uses ordinary ASCII Arabic numerals (1,2,3,4), and not the "Arabic" Arabic numerals ("ARABIC-INDIC DIGIT ONE" etc in Unicode).

clsn commented 3 years ago

Just for some more information, here's what I think good Hebrew support would entail, at minimum:

  1. From the Hebrew block (0590), you absolutely do not need to handle U+0591 through U+05AE. These are only used in Biblical typesetting, and most modern Hebrew fonts don't have them.
  2. What you do need is the letters, U+05D0 through U+05EA. Other spacing characters you should have: U+05F3 and U+05F4 (they look like prime and double-prime) and U+05BE (a short hyphen at the same height as the tops of most letters). U+05C3 might be good (it can be the same as a colon, or maybe a little heavier). It might make Yiddish-printers happy if you had the ligatures U+05F0 through U+05F2, but these needn't really look different from just the letters (and they're narrow letters anyway.)
  3. Non-spacing characters: you need to have U+05B0 through U+05BC. All (well, most of) the letters must be capable of handing any one (no need to support more than one) of the "lower" vowel-marks (U+05B0 through U+05B8 plus U+05BB) underneath; this may entail some anchor-placement to shift things over a little so as not to interfere with the descender of U+05E7, and things like U+05D3 and U+05E8 usually have their vowels underneath their leg, not centered under the letter. Other details of point-placement can be worked out later.
  4. U+05BC is a combining character, but all its combinations with letters (all that occur) are also precomposed characters, starting at U+FB30. U+05C1 and U+05C2 only go on U+05E9, and there are also precomposed characters encoded for those, both with and without U+05BC as well.
  5. You totally do NOT need U+05C4 and U+05C5, nor U+05C6; you can use the same glyph for U+05C7 and U+05B8 if you want, or make the stem of the former a little longer.
  6. Any letter must also be able to take U+05B9, which goes on the upper left, except that it combines with U+05D5 to form a precomposed character U+FB4B. Only U+05D5 needs to worry about U+05BA; it goes on the upper left.
  7. U+05B9 is not really crucial for Hebrew, but Yiddish-printers would like it. It theoretically can go on many different letters (in some scribal traditions) but for the most part you only have to deal with it forming precomposed characters U+FD4C et al.
  8. That's all you need from the Hebrew block. Down in alphabetic presentation forms, you absolutely do NOT need U+FB21 through U+FB28. You can use the same glyph for U+FB20 as U+05E2, if the vowels can go underneath it.
  9. U+FB1D and U+FB1F are mainly Yiddish, but are probably useful enough to keep.
  10. U+FB4F is hardly used in modern Hebrew typography, but has historical usage and is also important for writing certain non-Hebrew languages (like forms of Judeo-Arabic.) You don't need U+FB29. I've never seen U+FB1E, but that's just my experience.
  11. Everything else is a precomposed form, letter plus U+05BC and/or other dots or a vowel or U+05B9.

I'm pretty sure providing what I've listed above would be considered "good Hebrew support" by pretty much anyone. No special punctuation or numbers are needed, as I mentioned above. You may determine if that counts as "complex": twiddling non-spacing marks and stuff can be complicated. It's probably simpler than Arabic, though, which has all the joining and whatnot as well as vowels and other diacritics (but there are more Arabic speakers, so a bigger user-base.)

Not intended as a nudge or impatience; just felt like rambling and I thought I could help clarify this a little by doing so. Naturally, I have Opinions about many of the details involving the above, which I would be happy to share if you don't shut me up quickly^W^W^W^W^W^W^W if it would help. Hope this helps!

mcookly commented 1 year ago

Whenever Hebrew gets implemented, Miriam Mono CLM might be a good reference since it includes both Niqqud and T'amim.

clsn commented 1 year ago

Sure, that would be nice. But as I said above, te`amim are pretty much the lowest priority, and they're low enough that it's perfectly respectable to release the font without support for them. That is, they shouldn't hold things up if the other stuff is done. be5invis has stated that Iosevka Hebrew should support vowels (niqqud) at least, which is already a step beyond the minimum necessary (there are not a few Hebrew fonts out there with only the consonants. Though they're mostly display fonts, while Iosevka is a text font and would probably feel the lack of vowels more keenly.) The website of the Culmus project (which makes Miriam CLM) has a page of their fonts with te`amim, though Miriam is missing for some reason, and the others are not monospaced. Anyway. Yeah, it's as good a place to model from as any...

mcookly commented 1 year ago

After some digging into what it takes to add t'amim, I completely agree that it's for too complicated and trivial to implement right now, especially since Hebrew isn't added yet.