adobe-type-tools / agl-aglfn

AGL & AGLFN
BSD 3-Clause "New" or "Revised" License
94 stars 28 forks source link

glyph name list #1

Closed chandrab699 closed 7 years ago

chandrab699 commented 7 years ago

I am looking for a full list of glyph names for Kannada font.

I am confused between AGL and AGLFN.

say for kannada letter U+0C95, in AGL it is named as uni0C95 and AGLFN it is named as knKa

What is the advantage or need for AGLFN naming?

Is it primarily because of name field length limit?

miguelsousa commented 7 years ago

The glyph names uni0C95 and knKa are in neither of the lists. You must be talking about the GlyphOrderAndAliasDB file in the soure-kannada-sans repository. The first column of that file lists the glyphs' final names, i.e. the glyph names that the OTF file will have. The second column lists the glyphs' friendly names, i.e. the glyph names used in the UFO source files.

frankrolf commented 7 years ago

The advantage of using “friendly” glyph names is the level of abstraction – in feature code, it is easier to read sub x by y; than sub uni0078 by uni0079; The GlyphOrderAndAliasDB file just provides a handy means of translating between your custom easy-to-read glyph names, to the final, required names.

For a list of code points, you can consult the Kannada Unicode range: http://unicode.org/charts/PDF/U0C80.pdf

Hope this helps!

chandrab699 commented 7 years ago

I understand that, is there is advantage in the applications? Besides easing the font development while coding, is there any particular reason to name the glyph in particular way?

frankrolf commented 7 years ago

It is purely for font development. Most developers prefer names like knKa (Kannada letter Ka) to uni0C95, because the level of abstraction is not as high. In the final OTF, all friendly glyph names are gone.

bengulurumanjunatha commented 7 years ago

I am reading and also testing the font with abstract glyph names vs AGLFN.

With abstract glyph names "knKa", I am not able to retrieve the unicode text thru copy/paste from PDF. But with AGLFN, copy/paste retrieve the unicode text successfully.

Is this the main reason, it is recommended to use AGLFN vs abstract glyph names?

Even with AGLFN, some text such as "reph" "ಧರ್ಮ". When copied and pasted to text pad, it is retrieved as "ಧಮರ್"

Please assist.

kenlunde commented 7 years ago

@bengulurumanjunatha: Have you read the AGL Specification?

bengulurumanjunatha commented 7 years ago

@kenlunde Although, my understanding could be wrong. I have read the specification.

Please check this font https://github.com/erinmclaughlin/Hubballi

This is built with AGLFN naming scheme. Here is an example: Original text with Hubbali font: ಬಗ್ಗೆ Retrieved by copying from PDF created with above text: ಬಗೆ್ಗ

  1. Original: ದರ್ಪಣ retrieved: ದಪರ್ಣ

Issue is very well described in the below blog. http://aravindavk.in/blog/glyph-names/

Please review.

kenlunde commented 7 years ago

@bengulurumanjunatha When you wrote "copying from PDF," please be sure to indicate which PDF client you are using, because the results can differ in terms of extracting "content" from a PDF.

Looking at the Hubballi font, the glyph names include lots of sequences, whose content should extract without issues, but the glyph names include several odd suffixes, and it seems that not all final names conform to the AGL Specification.

Also, when you provide example, in addition to the actual text, it would be useful to know the final glyph names that correspond to the character sequence.

bengulurumanjunatha commented 7 years ago

@kenlunde Thank you for responding.

Client: Acrobat Reader DC 2017.012.20093 on Mac OS Sierra 10.12.6, Build 16G29

Example 1: Original text: ಬಗ್ಗೆ Unicode sequence: u+0C97 u+0CCD u+0C97 u+0CC6 Font glyph sequence: uni0C970CC6, uni0CCD0C97

copied text from PDF: ಬಗೆ್ಗ Unicode sequence: u+0C97 u+0CC6 u+0CCD u+0C97

Example 2: Original text: ದರ್ಪಣ Unicode sequence: u+0ca6 u+0cb0 u+0ccd u+0caa u+0ca3 Font glyph sequence: uni0CA6 uni0CAA uni0CB00CCD uni0CA3

copied text from PDF: ದಪರ್ಣ Unicode sequence: u+0ca6 u+0caa u+0cb0 u+0ccd u+0ca3

I think the problem is acrobat see only the glyph names and not the feature associated with it.

I don't know if something is wrong in the glyph name that is not letting Acrobat reader know the feature (say uni0CB00CCD.reph) or is it the problem in the AGLFN standard to recognize this reordering sequence via shaping engines as mentioned in the blog in the previous post.

I am sorry, if I have still not provided all the information you have asked me. If I have not provided all necessary, please let me know.

kenlunde commented 7 years ago

@bengulurumanjunatha Can you also supply a PDF that includes these sequences? Another factor in this is the PDF producer. If you didn't use Adobe InDesign, it would be useful to repeat this exercise using that app, because it produces a PDF with separate presentation and content layers, which does a much better job at preserving the original text sequence for the purpose of searching and copy&paste.

bengulurumanjunatha commented 7 years ago

PDF created from Pages with "Export To PDF" feature. Original Text ನಡುಗನ್ನಡ ಸಾಹಿತ್ಯದಲ್ಲಿ ಅನೇಕ ಹೊಸ ಸಾಹಿತ್ಯ ಪ್ರಕಾರಗಳು ಬೆಳಕಿಗೆ ಬಂದವು. ಇವುಗಳಲ್ಲಿ ಮುಖ್ಯವಾದವು ರಗಳೆ, ಸಾಂಗತ್ಯ ಮತ್ತು ದೇಸಿ. ಈ ಕಾಲದ ಸಾಹಿತ್ಯ ಜೈನ, ಹಿಂದೂ ಹಾಗೂ ಜಾತ್ಯತೀತ ಬೋಧನೆಗಳ ಮೇಲೆ ಆಧಾರಿತವಾಗಿದೆ. ಈ ಘಟ್ಟದ ಪ್ರಮುಖ ಲೇಖಕರಲ್ಲಿ ಇಬ್ಬರೆಂದರೆ ಹರಿಹರ ಮತ್ತು ರಾಘವಾಂಕ. ಇಬ್ಬರೂ ತಮ್ಮದೇ ಶೈಲಿಯಲ್ಲಿ ಕನ್ನಡ ಸಾಹಿತ್ಯದ ದಾರಿಯನ್ನು ಬೆಳಗಿದವರು. ಹರಿಹರ ರಗಳೆ ಸಾಹಿತ್ಯವನ್ನು ಬಳಕೆಗೆ ತಂದನು, ತನ್ನ ಶೈವ ಮತ್ತು ವೀರಶೈವ ಕೃತಿಗಳ ಮೂಲಕ. ರಾಘವಾಂಕ ತನ್ನ ಆರು ಕೃತಿಗಳ ಮೂಲಕ ಷಟ್ಪದಿ ಛಂದಸ್ಸನ್ನು ಜನಪ್ರಿಯಗೊಳಿಸಿದನು. ಅವನ ಮುಖ್ಯ ಕೃತಿ ಹರಿಶ್ಚಂದ್ರ ಕಾವ್ಯ, ಪೌರಾಣಿಕ ಪಾತ್ರವಾದ ಹರಿಶ್ಚಂದ್ರನ ಜೀವನವನ್ನು ಕುರಿತದ್ದು. ಈ ಕೃತಿ ಸಹ ತನ್ನ ತೀವ್ರವಾದ ಮಾನವತಾವಾದಕ್ಕೆ ಪ್ರಸಿದ್ಧವಾಗಿದೆ. ಇದೇ ಕಾಲದ ಇನ್ನೊಬ್ಬ ಪ್ರಸಿದ್ಧ ಜೈನ ಕವಿ ಜನ್ನ. ತನ್ನ ಕೃತಿಗಳಾದ ಯಶೋಧರ ಚರಿತೆ ಮತ್ತು ಅನಂಥನಾಥ ಪುರಾಣಗಳ ಮೂಲಕ ಜೈನ ಸಂಪ್ರದಾಯದ ಬಗ್ಗೆ ಬರೆದನು. ಇದೇ ಕಾಲದ ಕನ್ನಡ ವ್ಯಾಕರಣದ ಬಗೆಗಿನ ಮುಖ್ಯ ಕೃತಿ ಕೇಶಿರಾಜನ ಶಬ್ದಮಣಿ ದರ್ಪಣ.

‘ಜ಼ಾಕಿರ್' 'ಜಾ಼ಕಿರ್' 

AGLFN.pdf

kenlunde commented 7 years ago

@bengulurumanjunatha I recommend that you download and install the fully-functional thirty-day trial version of Adobe InDesign, and use it to repeat this test, being sure to specify the "Adobe World-Ready Composer" (Single-line or Paragraph) for the paragraphs. You use File→Export… to export a PDF file.

kenlunde commented 7 years ago

BTW, when I open your PDF using Adobe Acrobat Pro DC (2017 Release), select all of the text, copy, then paste into the TextEdit app on macOS (Version 10.12.6), I get the following:

copied-text

bengulurumanjunatha commented 7 years ago

@kenlunde you are displaying this text in "Kannada Sangam MN" which is default font in mac for Kannada. The last two words in single quote is issue of font.

But I see the last word ದರ್ಪಣ in the line above is wrong. It is has changed the order of reph (arkavattu). This is still an issue.

Please paste the text here. We can compare the text.

I will post the PDF from indesign, shortly.

kenlunde commented 7 years ago

Here is the text that I copied from the PDF you supplied:

ನಡುಗನ್ನಡ ಸಾಹಿತ್ಯದಲಿ್ಲ ಅನೇಕ ಹೊಸ ಸಾಹಿತ್ಯ ಪ್ರಕಾರಗಳು ಬೆಳಕಿಗೆ ಬಂದವು. ಇವುಗಳಲಿ್ಲ ಮುಖ್ಯವಾದವು ರಗಳೆ, ಸಾಂಗತ್ಯ ಮತು್ತ ದೇಸಿ. ಈ ಕಾಲದ ಸಾಹಿತ್ಯ ಜೈನ, ಹಿಂದೂ ಹಾಗೂ ಜಾತ್ಯತೀತ ಬೋಧನೆಗಳ ಮೇಲೆ ಆಧಾರಿತವಾಗಿದೆ. ಈ ಘಟ್ಟದ ಪ್ರಮುಖ ಲೇಖಕರಲಿ್ಲ ಇಬ್ಬರೆಂದರೆ ಹರಿಹರ ಮತು್ತ ರಾಘವಾಂಕ. ಇಬ್ಬರೂ ತಮ್ಮದೇ ಶೈಲಿಯಲಿ್ಲ ಕನ್ನಡ ಸಾಹಿತ್ಯದ ದಾರಿಯನು್ನ ಬೆಳಗಿದವರು. ಹರಿಹರ ರಗಳೆ ಸಾಹಿತ್ಯವನು್ನ ಬಳಕೆಗೆ ತಂದನು, ತನ್ನ ಶೈವ ಮತು್ತ ವೀರಶೈವ ಕೃತಿಗಳ ಮೂಲಕ. ರಾಘವಾಂಕ ತನ್ನ ಆರು ಕೃತಿಗಳ ಮೂಲಕ ಷಟ್ಪದಿ ಛಂದಸ್ಸನು್ನ ಜನಪಿ್ರಯಗೊಳಿಸಿದನು. ಅವನ ಮುಖ್ಯ ಕೃತಿ ಹರಿಶ್ಚಂದ್ರ ಕಾವ್ಯ, ಪೌರಾಣಿಕ ಪಾತ್ರವಾದ ಹರಿಶ್ಚಂದ್ರನ ಜೀವನವನು್ನ ಕುರಿತದು್ದ. ಈ ಕೃತಿ ಸಹ ತನ್ನ ತೀವ್ರವಾದ ಮಾನವತಾವಾದಕೆ್ಕ ಪ್ರಸಿದ್ಧವಾಗಿದೆ. ಇದೇ ಕಾಲದ ಇನೊ್ನಬ್ಬ ಪ್ರಸಿದ್ಧ ಜೈನ ಕವಿ ಜನ್ನ. ತನ್ನ ಕೃತಿಗಳಾದ ಯಶೋಧರ ಚರಿತೆ ಮತು್ತ ಅನಂಥನಾಥ ಪುರಾಣಗಳ ಮೂಲಕ ಜೈನ ಸಂಪ್ರದಾಯದ ಬಗೆ್ಗ ಬರೆದನು. ಇದೇ ಕಾಲದ ಕನ್ನಡ ವಾ್ಯಕರಣದ ಬಗೆಗಿನ ಮುಖ್ಯ ಕೃತಿ ಕೇಶಿರಾಜನ ಶಬ್ದಮಣಿ ದಪರ್ಣ. ‘ಜ಼ಾಕಿರ್' 'ಜಾ಼ಕಿರ್'

bengulurumanjunatha commented 7 years ago

@kenlunde I. In the above text you have copied from PDF. all the blwf, reph features have failed to copy.

II. I have attached the PDF created from indesign. Hubballi experiment page1 .pdf

Original text: ‘ಜ಼ಾಕಿರ್' 'ಜಾ಼ಕಿರ್’ ಬಗ್ಗೆ ದರ್ಪಣ ದಟ್ಸ್, ಎಕ್ಸ್ , ಮಾರ್ಚ್, ಟೆಕ್ಸ್ಟ್, ಬುಕ್ಸ್, ಸಾಫ್ಟ್, ಜಸ್ಟ್, ಪೋಸ್ಟ್‌ಪೇಯ್ಡ್‌
ಎಕ್ಸ್‌ಪ್ರೆಸ್ ರ್ಪೊ ಆ್ಯಕ್ಷಿಸ್‌ Version: Adobe InDesign CS 6 OS: Windows 10

Issue 1: Most of the text is successfully copied from PDF created from Adobe Indesign. Expect "zero width non-joiner" has not copied correctly.. Words "ಇನ್‌ಡಿಸೈನ್" "ಪೋಸ್ಟ್‌ಪೇಯ್ಡ್‌" "ಎಕ್ಸ್‌ಪ್ರೆಸ್" Hubballi font has u+200c has labeled the glyph as "uni200c". zero width non-joiner (u+200c) is copied has space. Issue 2: Not related to AGLFN. Adobe World-Ready Composer has issue rendering the words ‘ಜ಼ಾಕಿರ್' 'ಜಾ಼ಕಿರ್’ 'ಆ್ಯಕ್ಷಿಸ್‌' If you know how to report this issue to right team, it will be great help.

Kannada script uses "u+200c" and "u+200d" frequently. I will check some words with zero width joiner, u+200d.

Also, what is different between PDF generate in adobe indesign and else where.

Thanks for looking into this.

kenlunde commented 7 years ago

@bengulurumanjunatha I don't read the script that is being discussed, so you'll need to be a lot more explicit about things.

First, does the PDF that was exported from InDesign exhibit improved behavior? What you wrote suggests so, but please confirm.

About Issue 2, I will need excruciating, such as screenshots that show the expected behavior versus the actual behavior, and any suggestions as to how to address it. I can then pass along something meaningful (and hopefully actionable) to the InDesign development team.

bengulurumanjunatha commented 7 years ago

@kenlunde

"Review of PDF created from Indesign" Yes, situation has improved to near perfection.

Issue with PDF: Version: Adobe InDesign CS 6 OS: Windows 10 Font: Hubballi Words "ಇನ್‌ಡಿಸೈನ್" glyph names-> ( uni0c87 uni0ca80ccd uni200c uni0ca10cbf uni0cb80cc8 uni0ca80ccd ) without spaces "ಪೋಸ್ಟ್‌ಪೇಯ್ಡ್" glyph names-> ( uni0caa0ccb uni0cb80ccd uni0ccd0c9f uni200c uni0caa0cc7 uni0caf0ccd uni0ccd0ca1 ) without spaces "ಎಕ್ಸ್‌ಪ್ರೆಸ್" glyph names-> ( uni0c8e uni0c950ccd uni0ccd0cb8 uni200c uni0caa0cc6 uni0ccd0cb0 uni0cb80ccd ) without spaces All the above glyphs were copied successfully. Expect "zero width non-joiner", U+200C, glyph name "uni200c". This is non printing character "zero width non-joiner" was copied either as space or nothing. (https://en.wikipedia.org/wiki/Zero-width_non-joiner). And another non-printing character U+200D "zero width joiner" (https://en.wikipedia.org/wiki/Zero-width_joiner)

Result: "ಇನ್ಡಿಸೈನ್" "ಪೋಸ್ಟ್ಪೇಯ್ಡ್" "ಎಕ್ಸ್ಪ್ರೆಸ್" capture

Resolution needed: u+200c (zwnj) and u+200d (zwj)needs to be written in the PDF.

Question: Should the glyph name of u+200c be uni200c or u200c and glyph name of u+200d be uni200d or u200d?

kenlunde commented 7 years ago

@bengulurumanjunatha: Thank you for confirming that the PDF that was exported from InDesign was nearly perfect, and for the additional details and suggestions that I will pass along to the InDesign team.

About your question, glyphs that correspond to characters in the BMP should use the "uni" prefix in their names, meaning that you should use uni200C and uni200D as the glyph names for U+200C and U+200D, respectively.

bengulurumanjunatha commented 7 years ago

@kenlunde Thank you. Follow up question.

  1. What does InDesign writes additionally in PDF, which other PDF writes are not writing?
  2. I suppose Adobe Acrobat Pro/DC also writes this information or is it specific to text editor not to PDF writer.
  3. If this is a something PDF writer dependent. We would like to this to be available in all PDF writers. We want to start reporting this issues in correct channel.

Also, I will write a detailed report about the issue in "Adobe World Ready Composer" with Kannada language.

Thanks, Manjunatha

kenlunde commented 7 years ago

@bengulurumanjunatha: Answers to your questions:

  1. InDesign writes to the PDF file a "content" layer that is a more accurate representation of the text that corresponds to what is displayed in the PDF, which is the "presentation" layer.

  2. Adobe Acrobat Pro/DC is merely copying the text from the "content" layer. If a PDF file does not have a "content" layer, the glyph information in the "presentation" layer is used to derive content, and that is what is copied.

  3. The reason why InDesign-exported PDFs have better "content" is because its PDF Library has direct access to the original text that is used in the source InDesign document. For virtually all other PDF producers, what ends up getting converted to a PDF file is a PostScript file that is fed into a separate PDF-producing client, such as PDFWriter.

bengulurumanjunatha commented 7 years ago

@kenlunde Adobe World-Ready Composer issue is reported here. Please forward it to the right team.

https://indesign.uservoice.com/forums/601180-adobe-indesign-bugs/suggestions/31005514-bug-in-adobe-world-ready-composer-for-kannada-lang

kenlunde commented 7 years ago

I reported both sets of issues to the InDesign team.

bengulurumanjunatha commented 7 years ago

@kenlunde Thank you very much for your support.