BenTalagan / glaemscribe

Glaemscribe, the tolkienian languages/writings transcription engine.
https://glaemscrafu.jrrvf.com/english/glaemscribe.html
Other
44 stars 8 forks source link

[Discussion/Specification] Relayout/Remap legacy fonts ? #15

Closed laicasaane closed 5 years ago

laicasaane commented 6 years ago

I've just found out that non-breaking space is missing in Annatar, Eldamar and Sindarin fonts. If I use that space, its font will be changed to the default and the spacing is not correct.

Also, Glaemscribe processor seems to ignore the non-breaking space. I've added a definition for it in the charset and use it in the mode, but the output text contains only normal spaces.

laicasaane commented 6 years ago

Sorry for being reluctant :) , but I'm not convinced.

No, I understand you. Because I've been seduced by the PUA layout for a long time, even before I discovered Glaemscribe.

Do you have any concrete examples

For the time being, the first-hand limitation I've met with the current layout and FTF layout is I cannot make use of page numbering in any word processor, because there is no way to define custom numeric characters, as far I as know.

we can really focus on the layout debate.

After having automated tool, can we just make some quick test?

BenTalagan commented 6 years ago

For the time being, the first-hand limitation I've met with the current layout and FTF layout is I cannot make use of page numbering in any word processor, because there is no way to define custom numeric characters, as far I as know.

I think I see the problem for page numbering. I've just tested to get a better impression, and these are the problems I've seen (with Open Office).

As far as I see, even with a correct latin mapping there are things you would not be able to achieve : base 12, right to left, with most significant digit dot, all of these elvish features would not possible. So the best you can do atm and you're right, it is having latin decimal numbering, with tengwar numbers replacing latin digits. Clearly, here, (and I'm sure you do agree with this), we're facing a limitation of the software, not of the layout. So what you describe is clearly a hack, because a latin digit is a latin digit, a japanese digit is a japanese digit, and a tengwar digit is a tengwar digit.

So I think we should stick to the FTF/Everson norm BUT with additions, especially hacks for dealing with these kind of retro-compatibility/legacy/whatever problems. For example, I could remap a legacy font on the FTF/Everson norm, and duplicate the tengwar decimal digits to the latin digit slots (and, why not, duplicate some punctuation signs to latin slots too). In addition, I would add some ranges in the PUA for mapping legacy versions of the tehtar. Such a font would solve all your problems : word spacing and breaking/page numbering. It would be clean because aligned on the norm. And it would not interfere with anything (except for the hacks).

After having automated tool, can we just make some quick test?

The word 'quick' might be a bit too optimistic 😄 What are the needs for your test? Does the hypothetical font that I've described suits you well?

It might imply a new font, and a new cst (charset file) at minimum. The mode files should not change, we're at a lower level here. So there's still a little bit of work to do in all cases.

laicasaane commented 6 years ago

Such a font would solve all your problems

"all" is too optimistic a word, 😂 I agree with you, this problem is really a software limitation, we can never solve it by ourselves. Having some hacks can help only a little, but that's sufficient enough for now.

machsna commented 6 years ago

Thanks Talagan for pointing me to this discussion.

It appears both of you are now sympathizing with using an expanded FTF layout, mapping the tengwar numerals over our Arabic numerals, thus violating the Unicode standard. This would be justified by solving the problem of tengwar page numbers in certain applications. I have several reservations against such a solution:

This last reservation brings me to a broader point: Our knowledge of numerals in tengwar is very sketchy. Up to now, we know of at least seven different numeral system to be used with tengwar, some of them only partly or indirectly attested (I am leaving away the various options of marking numerals with dots or bars):

  1. J.R.R. Tolkien’s numerals for 1, 3, 4, and 6 in DTS 49.
  2. J.R.R. Tolkien’s tengwar numerals in DTS 87 – 1: parma, 2: tinco, 3: calma, 4: quesse, 5: umbar, 6: ando, 7: anga, 8: ungwe, 9: unque, 0: stemless vilya.
  3. J.R.R. Tolkien’s Rúmilian numerals in DTS 87, to be used with tengwar – nota bene: these are different from the various Rúmilian numerals we know of, cf. Helios’s analysis Rúmilian Numerals.
  4. J.R.R. Tolkien’s Arabic numerals in PE 20 Q10h, Q11j, to be used with tengwar – they have a style closely resembling the tengwar, thus 1 looks like a short carrier with a dot above, 3 like alda, 6 like esse, 0 like úre; optionally, 9 may look like rómen (so there you may have an eighth system).
  5. J.R.R. Tolkien’s tengwar numerals in PE 20 Q10h, Q11j – 1: long carrier with a dot above, 2: tinco, 3: ando, 4: vilya, 5: esse, 6: silme, 7: calma, 8: anga, 9: rómen, 0: úre.
  6. The primary letters according to Christopher Tolkien in Quettar 13.
  7. Christopher Tolkien’s numerals in Quettar 13 – they look similar J.R.R. Tolkien’s numerals from DTS 49.

In my opinion, Christopher’s numerals are certainly not the best choice. They differ from J.R.R. Tolkien’s numerals in DTS 49. The difference is that the DTS 49 numerals look much more like actual tengwar than Christopher’s numerals. Unless the primary material Christopher’s numerals are based on is published, we cannot know for sure whether Christopher’s numerals truly represent the shapes J.R.R. Tolkien had intended. I believe the only reason why it has become popular on the internet is that it was there before the internet.

Long story short: I do not think that a possible solution for tengwar page numbering is sufficient reason for mapping some system of tengwar numerals over our Arabic numerals. Instead, I believe the way to go is as follows:

And now for something completely different:

BenTalagan commented 6 years ago

Hi Mach, and thanks a lot for taking the time to participate to this discussion, with such a neat and documented post! I appreciate it all the more that Glaemscribe was designed with the FTF Project in mind all along the way. I'm totally convinced that the FTFP is, by evidence, the most advanced milestone and serious work regarding the matter of bringing and adapting Tolkien writings to modern technologies. The FTF Project is the achievement of one or two decades of work from various dedicated people and should be, to my opinion, the obligatory basis and bedrock for future works.

I don't really feel like I'm sympathizing with the idea of violating the unicode standard ^^; (this sounds a bit too harsh) , but I totally follow you nonetheless on your argumentation of trying to respect (in a perfect world) the segmentation of things and the meaning of characters (hence my argumentation above in that same direction too). Your remarks on numerals are all pertinent and convincing, and it only comforts me in my first impression that doing such a mapping (from tengwar digits to latin ones) would be clearly identified as a dirty hack (in the meantime, we have to cope with the fact that having all digits in the PUA will prevent us from affecting them the unicode 'NU' numeral class - but I don't think it's a major problem since in that configuration, line-breaking matters should be solved anyway). [Please note nonetheless that in that scope, the aim of the hack is just to duplicate the tengwar digits (while still keeping them in the PUA) to solve that very specialized problem, not to alter the FTF mapping]

That being said, we're not (at the moment) in the perspective of porting the legacy fonts to a full OpenType solution. Laicasaane regularly reports very pointy issues regarding tengwar transcriptions and that make the project evolve in the right way ; because the vast majority of existing tengwar fonts are not equipped with clever OpenType features and are still using the old Dan Smith mapping, I've designed Glaemscribe to be able to handle all that variety (the fonts coming with their flaws) ; but now that the project is getting stronger and stronger, it looks like people are enjoying that variety and at the same time they want more - more features, more precision, more cleanliness. Thus, we're at the middle of the ford. There's a need for better fonts, but (I may be wrong), imho there's a lack of dynamics for a few years in that field. My guess is that for a designer, it's quite complicated to deal with all these Unicode and OpenType features (which is the food for us, poor engineers). The Glaemscribe project started on a challenge proposed by my friend Didier Willis concerning the transcription of Sarati, and the first step was to port Måns Björkman's Sarati Eldamar font to OpenType so that it could handle well the diacritics. I've done it using GPOS tables and it was not that easy ; and it was only dealing with Sarati, which are probably easier to instrument because diacritics are put on the other side of the main direction line. Tengwar may offer the most complex cases, because both bearer or diacritics could hypothetically change their aspect when combined (size, rotation, even the shape), so we have to deal with all kind of combinations, ligatures, stacking & so on. I consider these font design tasks out of my scope (or let's say, tangential) regarding my work on Glaemscribe, for many reasons : first, I'm not a font designer, so there are probably real artists out there that would do far better; and, secondly, I'd like to focus on the engine and the modes and I'm already overloaded by work in that domain (sometimes I feel like I don't even actually). But in the meantime, I'm forced to recognized that fonts are a prerequisite :D (obviously), and I always finish torn in between.

So what interests me here in Laicasaane's requests is that I see an opportunity for keeping our variety of fonts and renderings, while transitioning slowly to the next stage. It would be too complicated and too long (and maybe not feasible for me!) to instrument the legacy fonts up to a point where they would be polished and opentype-full-featured ; but it's possible for me to bring them closer to that state in a reasonable amount of time, and a unicode remapping is a logical first step.

Are you considering adding variant tehtar to the PUA for placing them on different width signs (as in the Dan Smith encoding)? I think this would be a very poor solution. These placement choices are better hard-coded into the font by means of the OpenType Layout. For a long time, there was a severe lack of applications capable of displaying advanced OpenType Layout operations. Basically, you were restricted to Xe(La)TeX. This has changed in recent years, with major applications such as LibreOffice, Firefox or Google Chrome now having OpenType Layout capabilities that are at least as powerful as the ones of the Adobe Creative Suite (or Cloud).

This is what I've been considering and what I've implemented for the last two days, in that transitional perspective ; it's not meant to be engraved or normalized. I've used the end blocks of the PUA for that purpose to avoid hypothetical collisions on the mid term. I do agree with you, it's a poor solution, but it's a (relatively) quick hack and it's ready for use as a crossroad solution. I still hope that these steps done with Glaemscribe will motivate other people by proving things can be done, and facilitate their efforts in working on new fonts :). (By the way, could you provide us with any news concerning the OpenType version of Telcontar? That would be awesome to have it in Glaemscribe, as it would offer the most accurate Tengwar rendering ever).

@Laicasaane : that was hard work, but new font adaptations and charsets are now available in the unicode_font_remapping branch. As Mach has said very clearly, please remember that these are transitional pieces of work : elvish numerals have been copied to Arabic Numerals to fit your needs so this is a clearly personalized hack (in the meantime, these fonts do not have any other latin chars, so for the moment, this does not overlap anything), as well as the fact that they still use old DS tehtar variants (remapped in the end of the PUA). Could you please beta test it to see if it fits your need?

@machsna : Thanks a lot again, I feel like this is exactly the kind of discussions that is needed to make things evolve the right way. Please feel free to share other thoughts or participate again any time!

Cheers, Talagan.

machsna commented 6 years ago

(By the way, could you provide us with any news concerning the OpenType version of Telcontar? That would be awesome to have it in Glaemscribe, as it would offer the most accurate Tengwar rendering ever).

I forgot: Here is a proof of concept for porting the Tengwar Telcontar intelligence from SIL Graphite to the OpenType Layout: OpenType Layout test page. The SIL Graphite column will only display properly on Firefox, the only browser to support this technology. The right column displays perfectly in Firefox and Google Chrome, almost perfectly in Safari (in the absence of any explicit feature setting, mkmk is off). If I remember correctly, the Microsoft browsers have similar issues like Safari. They can be solved with a funny CSS hack:

font-feature-settings: 'dumb' 1;

Explanation: This is instructs the browsers to switch on the OpenType Layout feature dumb. The point is that no such feature exists, so a smart browser will ignore this instruction, while dumb browsers may use it to improve their font display. ☺

The only feature I could not get to work is the underlining. Anyway, underlining is perhaps a problem best solved on the application level, not on the font level.

The font is not yet ready to be released because it still lacks a few characters, especially several uppercase characters. But of course, if you must, you will find it in the SVN repositories (Arno already uses it on Tecendil).

BenTalagan commented 6 years ago

The font is not yet ready to be released because it still lacks a few characters, especially several uppercase characters.

That's where I stayed ; I remember having read some exchanges last year on the FTF list (even participating if I'm correct) and had put into a corner of my mind a sign 'wait for release'.

But of course, if you must, you will find it in the SVN repositories

I'll take a look at this today, and test an integration. It will probably be a hundred times easier than for the other fonts 😃 Anyway it did not come to my mind that the latest version of the font in the SVN was sufficiently advanced to be used, this is awesome news! Thanks!

laicasaane commented 6 years ago

@machsna I initially support the idea of a Unicode standard layout, especially after knowing the FTF Project. I like that idea so much so I'd decided to change the layout of Tengwar Annatar by myself. But that is just a short-lived idea, lacking of font designing skills is a huge drawback I cannot overcome. After discovering Glaemscribe, I've immersed myself in making a transcriber for Quốc ngữ script as well as revising the mode for Vietnamese. When I was able to combine some long documents, finally, problems arised one after another. Thus I have to consider some quick hacks to finish my works first. Alas, thanks for your information regarding macros, I'll see if I can make it work. If software-level problems can be solved by software-level solutions, then we don't need any hack in the fonts, and that would be more appropriate.

And here is the mode for Vietnamese, written in English: https://drive.google.com/open?id=0B4vpFvDhhjSmcHlwNlh2YXpfcVE

BenTalagan commented 5 years ago

Closed after 1.2.0 release.