MPEGGroup / OpenFontFormat

Official MPEG repository to discuss issues on Open Font Format (ISO/IEC 14496-22)
31 stars 6 forks source link

Proposal: Replace section 5.7.6 (meta) #17

Closed simoncozens closed 3 years ago

simoncozens commented 3 years ago

I propose to replace the existing section 5.7.6 with the following text. This is a non-technical change, with a view to increasing clarity through brevity and removing information repeated from elsewhere (both elsewhere in the standard and elsewhere in referenced standards).

Note that this introduces a normative reference to BCP47.


meta table - Metadata Table

This table allows for arbitrary textual or binary metadata to be associated with a font file.

metaTable

uint32version==1
uint32flags ==0
uint32unused ==0
uint32dataMapsCount
DataMap[dataMapsCount]dataMaps

DataMap

Tagtag
Offset32dataOffset
uint32dataLength
dataOffset
Offset in bytes to the metadata entry, measured from the start of the meta table.

The following metadata tags have been registered:

The language/script combinations in dlng/slng metadata blocks are adapted from [[!BCP47]]. While the definitions and data are identical to BCP47, within the dlng/slng data blocks, the script subtag is mandatory and the language tag is optional.

Example
A font may be designed for Japanese, and include sufficient kanji to adequately display Chinese, even though some radical forms will differ in appearance from those expected by Chinese users. Its `dlng` will therefore be `Jpan` but its `slng` would be `Jpan,Hans,Hant`
dscorbett commented 3 years ago

The OpenType recension of 'meta' (I didn’t bother checking OFF too) allows spaces after the commas. This proposal doesn’t.

“the script subtag is mandatory and the language tag is optional” could be interpreted to mean that (for example) “-Latn” is a valid OFF language/script combination, which is obviously not the intent, since in BCP 47 a script always follows a hyphen. I think it would be clearer to explicitly give the new definition of langtag.

davelab6 commented 3 years ago

adapted from [[!BCP47]].

Note that this introduces a normative reference to BCP47.

What should the link be? https://tools.ietf.org/html/bcp47 ?

vlevantovsky commented 3 years ago

BCP47 is already included as a normative reference in OFF.

simoncozens commented 3 years ago

All the more reason to remove the ScriptTag verbiage, then!

PeterConstable commented 3 years ago

It's important to be very clear that slng/dlng values are not BCP 47 language tags. And as David Corbett observes, simply referencing BCP 47 and mentioning subtags that are mandatory or optional doesn't make clear what are the valid values. The "ScriptLangTag" label and rewrite rules in the current spec were intended to make all that completely clear.

and the language tag is optional

That must be reworded:

and the language subtag is optional

PeterConstable commented 3 years ago

Also, the longer explanation in the current spec about the meaning of slng and dlng was included because, at the time, these concepts were very new, unfamiliar and different from how meta-info of that kind had been handled previously: OS/2 charset and unicode range bits. It's fair to consider it semantic commentary. Perhaps it's no longer needed in the spec. But given that the 'meta' table is something many font developers are still discovering, I'm not entirely convinced there is no need to provide extra explanation.

PeterConstable commented 3 years ago

While the definitions and data are identical to BCP47

That's not entirely true. Or, at least, the intended referents of "definitions" and "data" are so vague as to make the statement impossible to validate.

The current spec makes clear that the semantics and valid values of each element in the ScriptLangTag expansion are the same as that element within BCP 47.

But you can't say that the definition of "Hant-HK" is identical to BCP 47 because BCP 47 doesn't assign any definition to that tag: it simply doesn't permit it at all.

I welcome the desire to make things more succinct, which can sometimes add clarity. But with this proposal, I think a lot of clarity is removed.

PeterConstable commented 3 years ago

This is a non-technical change

But it is a technical change because it under-specifies in comparison to the current spec.

simoncozens commented 3 years ago

Oh, I didn't realise things were currently quite so broken. :-/ Yes, we will need a technical change to fix this mess.

simoncozens commented 3 years ago

The OpenType recension of 'meta' (I didn’t bother checking OFF too) allows spaces after the commas. This proposal doesn’t.

Thanks, I've fixed that.

davelab6 commented 3 years ago

This is a non-technical change

But it is a technical change because it under-specifies in comparison to the current spec.

I don't think this is what I understood recent delineations of editorial vs technical changes to mean.

To me, this is a non technical change because it doesn't require existing implementations to do anything.

davelab6 commented 3 years ago

Design languages. A comma-separated list of language/script combinations for which the font was designed. Whitespace between elements is not significant. slng: Supported languages.

Since slng is supported, past tense, and the definition of dlng is past tense, emphasis mine as quoted, I suggest

Design languages

Become

Designed languages

(This would be much easier if this was a pull request...)

vlevantovsky commented 3 years ago

(This would be much easier if this was a pull request...)

Let us not go there again :) (Yes, I agree)

davelab6 commented 3 years ago

Let us not go there again :) (Yes, I agree)

I'm making a slightly different point to before, though :)

In the past we've discussed getting the entire OFF text into this repo, and I for one have concluded it is unlikely to happen until ISO changes its core policies on document redistribution; and we've discussed getting "formal" change proposals reviewed here instead of the mpeg-otspec mailing list, and for now the formal submission must happen on the mailing list.

So, what I am proposing is that pre-submission drafts of change proposals can be collaboratively authored in this repo via Pull Request.

Instead of Simon posting a precis and then a rule and then a markdown document of the proposal itself, he could make a PR to the repo with the latter markdown content, named eg /2020/Proposals/Section_5_7_6_(meta).md, and then we could have this discussion in /MPEGGroup/OpenFontFormat/pull/17 instead of /MPEGGroup/OpenFontFormat/issues/17 - with the additional discussion/collaboration tools that Github provides for PRs over Issues.

simoncozens commented 3 years ago

Yeah, I realise this would have been a neater way to do things. We don't even need to eventually merge it into the repo, we can just use the PR branch as a space to discuss and collaboratively edit.

We are all learning new ways to operate...

dscorbett commented 3 years ago

A comma-separated list of language/script combinations for which the font was designed. Whitespace between elements is not significant.

What counts as whitespace? For that matter, what counts as a comma? This proposal nowhere says that the values are limited to ASCII.

simoncozens commented 3 years ago

There’s always a temptation to nitpick and overspecify, but many such file format concerns can be dealt with under the general implementation philosophy of being liberal in what you accept and conservative in what you produce.

simoncozens commented 3 years ago

OK, I'm going to convert this to a PR as mentioned above so that others can more easily make positive contributions. The BCP47 situation is unfortunate; we may need to go back to some of the original wording on that for now, until there is consensus on how to move forward.

PeterConstable commented 3 years ago

A fundamental problem is that you haven't really identified the problem in the standard you're trying to fix. This is a critical part of the ISO balloting process: a comment identifies a problem, why it's a problem, and then (preferably) proposes a fix. You've provided a "fix", but you haven't said what the problem is.

simoncozens commented 3 years ago

This is a critical part of the ISO balloting process: a comment identifies a problem, why it's a problem, and then (preferably) proposes a fix.

Thanks, Peter, we are gradually reverse-engineering the change process through successive rounds of Zendo. :-)

@davelab6, something to add to #12.

davelab6 commented 3 years ago

Better for #5 (or a follow on) I think.

As the window for proposals is closing, will you be able to post a PR within the next say 48 hours?