FamilySearch / GEDCOM

Apache License 2.0
161 stars 20 forks source link

Possible mistake in definition of mt-char #249

Closed seivadnomis closed 1 year ago

seivadnomis commented 1 year ago

In section 2.10 Media Type, mt-char is defined to include the space character %x20. But the corresponding production in RFC 2045, section 5.1 says:

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>

It seems unlikely that a space can be part of the name of a media type, subtype or attribute, as GEDCOM 7.0 appears to say.

funwithbots commented 1 year ago

It's probably noteworthy that quoted-pair is composed of "\" CHAR in https://www.rfc-editor.org/rfc/rfc822 (p 10) with CHAR being defined as

                                             ; (  Octal, Decimal.)
 CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)

or 0x00-7E whereas mt-qpair restricts the ASCII values to 0x09-7E. Assuming this is intentional, a note in the docs seems appropriate.

dthaler commented 1 year ago

I believe the GEDCOM spec is wrong and should be fixed. Furthermore, the correct reference for media types has been, since 2013, RFC 6838 (the gedcom spec at least points to the IANA registry which points to that RFC) aka BCP 13, and that RFC does have ABNF. The "summarized" ABNF in the gedcom spec is wrong in comparison (e.g., allowing a media type to start with "#" etc.). In contrast the sections defining g7:FORM and g7:MIME correctly point to BCP 13. It's only section 2.10 that contradicts the later sections in the GEDCOM spec.

dthaler commented 1 year ago

In my view, there is no reason to repeat ABNF from RFCs, we should instead just refer to it like we do for Language. That said, there are multiple definitions for how to compose media types and parameters into a single string (often called a Content-Type after the name of the header in HTTP and mail), which appear to vary by protocol so we should specify that. HTTP uses a more liberal definition with spaces permitted around the semicolon delimiters, and I would argue we should match that since HTTP use is prevalent and HTTP libraries may construct content type strings as such. See https://github.com/FamilySearch/GEDCOM/pull/251.