McLoo / noto

Automatically exported from code.google.com/p/noto
0 stars 0 forks source link

Noto Sans Javanese and Balinese claim to support Unicode ranges and code pages they don't actually support #298

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The Noto Sans Javanese and Balinese have bits set in their OS/2 tables that 
claim that they support Unicode ranges and code pages that they don't actually 
support in any meaningful way, such as Latin, geometric shapes,  or even the 
Arabic code page 1256 for Noto Sans Javanese.

Realistically, the only bit of ulUnicodeRange* and ulCodePageRange* that makes 
sense to set is, for Noto Sans Balinese, bit 27 of ulUnicodeRange1 to indicate 
that Balinese is supported. For Noto Sans Javanese, there's no bit to set 
because no bit is defined for Javanese in ulUnicodeRange*.

The problem may well occur in other Noto fonts; these are the only two I have 
checked.

What steps will reproduce the problem?
– Inspect the OS/2 tables for Noto Sans Javanese and Balinese.

What is the expected output? What do you see instead?
– The ulUnicodeRange* and ulCodePageRange* fields of the OS/2 table should 
only have bits set that indicate Unicode ranges and code pages that the fonts 
actually support. Instead, they claim to support Unicode ranges such as Latin 
and geometric shapes, and code pages such as 1252 and (for Noto Sans Javanese) 
1256.

What version of the product are you using? On what operating system?
Noto Sans Javanese 1.01; Noto Sans Balinese 1.02.

Original issue reported on code.google.com by googled...@lindenbergsoftware.com on 16 Mar 2015 at 12:02

GoogleCodeExporter commented 9 years ago
Norbert, these whole bits are underdefined in the specs, so we have kinda given 
up on setting them right. But we gave up mostly because we couldn't find any 
modern software that cared about the bits. Do you know any such software?

Original comment by roozbeh@google.com on 16 Mar 2015 at 8:28

GoogleCodeExporter commented 9 years ago

Original comment by roozbeh@google.com on 16 Mar 2015 at 8:28

GoogleCodeExporter commented 9 years ago
I noticed the issue because Xcode, Apple's IDE, shows Noto Sans Javanese as a 
list of Arabic glyphs. You can see this, for example, if you simply drop the 
font file on the Xcode icon in the Finder to "open" the font. It's more 
noticeable when you have the font checked into a project and use Xcode's Commit 
command to review changes to files in the project.

This use alone certainly does not qualify the issue as high priority.

Original comment by googled...@lindenbergsoftware.com on 16 Mar 2015 at 9:45

GoogleCodeExporter commented 9 years ago
Interesting. Now at least we know there is software out there that uses the 
bits.

Doug, the best way forward is perhaps fixing the bits in autofix_for_release. 
notolint already has some code for checking some of it, but I don't remember if 
it's turned on or not. Parts of it (the table from the OpenType spec) may not 
be open sourced yet either.

Original comment by roozbeh@google.com on 16 Mar 2015 at 9:50

GoogleCodeExporter commented 9 years ago
So, it would help to know more about how XCode uses these bits.

Other Noto fonts have more bits set than the existing logic would dictate. My 
impression is that the fonts set a bit whenever the cmap has a character in any 
of the ranges in the corresponding bucket. And sometimes bits are set when 
there are no characters in the defined ranges (it may be the range definitions 
their software uses are different from those in the spec that I implemented).

I implemented the opentype unicoderange data from the spec, since as Roozbeh 
mentioned, nototools didn't have the data the (disabled) lint code required.  I 
changed the heuristic to require that either 200 chars or 50% of any of the 
ranges for a bit needed to be set.  This is a bit more lenient than the 
original heuristic, which for example would not mark CJK functional unless big 
chunks of CJK unified Ideographs Extension A and B were present. But when you 
get down to it, all these heuristics are suspect.  It's not even clear what 
'functional' means, since it's not in the context of a language.

On the other hand, it seems safe to clear bits we don't have some reasonable 
coverage for.  So autofixing this doesn't hurt.

Original comment by dougf...@google.com on 18 Mar 2015 at 1:13