Open tychonievich opened 2 years ago
Comments:
I like the word coverage instead of compliance. Transparency implies a review of the software and how it communicates with the user. Likewise, I don't know how or what program would check Lossless Exports.
Discussed by steering committee. There was interest in this topic, but not yet consensus. Lossless imports seemed difficult to properly define. We noted that it would also be nice to include some kind of pre-purchase transparency, such that users could determine what specific structure coverage an application has without first using the application.
Below is a longer draft that I think addresses the above comments.
Non-violation: Does not violate the spec. A program that does nothing is trivially non-violative.
Import non-violation: Imposes no restrictions on files it imports beyond those in the spec itself. Can import any valid file, though it might not understand all parts of the file (see import coverage below).
Export non-violation: Every file exported by the application with a HEAD.GEDC.VERS is a valid file as defined by the identified version of the spec.
Meaning non-violation: Neither import nor export adds new information not previously present (other than metadata about the import or export itself).
The most common cause for meaning violation comes from an inexact match between the spec and an application's internal data model. For example, consider an application that has a "religious rite" event type but does not have specific subtypes like baptisms. That application could import a BAPM
as a religious rite, and could export a religious rite as an EVEN
: those transformations lose some information, but do not add any spurious new information. That application could not export a religious rite as a BAPM
, however: that would be adding a spurious assertion as to the type of rite it was.
Coverage: Uses all applicable parts of the spec. A program with a limited feature set has coverage if it implements the parts of the spec that relate to those features.
Import coverage: Every standard structure in an imported file is used to populate the appropriate internal state of the application except for structures representing data that the application lacks the ability to represent.
Export coverage: All of the application's internal state is represented in the exported file except for (a) state that the spec lacks standard structures to store and (b) state that the user has specifically requested the application not to export.
Extension coverage can be defined analogously on a per-extension basis, but cannot be defined for the unbounded set of "all extensions".
In cases of inexact matches between the spec and the application's data model such that import or export of some state would result in loss of data specificity, an application can claim coverage if it ignores that data or if it converts it in a meaning non-violative way.
Transparent: informs the user of any potential data loss
Import transparency: If an imported file has structures (standard or extension) that are not being fully converted into the internal data of the application, the user is alerted to that fact and can access the list of such structures.
Export transparency: If an application has internal information that it will not fully represent in the exported file, the user is alerted to that fact and can access what information was not exported.
Feature transparency: Potential users can access a list of which standard structures defined in the spec the application supports and which it does not, and can do so without first purchasing, creating an account, or otherwise using the application. Including extensions the application supports in this list is recommended, but not required.
Inexact matches between the spec and the application's data model results in something less than full conversion, and must be reported to qualify for transparency. Examples include importing a BAPM
as a generic event or exporting some of the application's internal event types as generic EVEN
structures.
Lossless: import/export cycles do not remove information
Lossless exports: The following sequence of steps does not lose data:
It is expected that many applications will need to use extension structures to achieve lossless exports.
A specific file is losslessly imported if all of its data (in both standard and extension structures) is fully imported into the application's internal state; that is, if import transparency would have nothing to report. Because the set of extensions types is potentially limitless, and because preserving unknown structures as-is does not qualify as importing, applications cannot claim that all of their imports will be lossless.
Preserving unknown structures as-is does not qualify as importing those structures because doing that can result in inconsistent data. Many structures require some kind of consistency with other structures: some structurally (like CHIL/FAMC pairs) and others semantically (like BIRT.DATE of all the spouses of a FAM being earlier than the FAM.MARR.DATE). Thus preserving some unknown structures as-is while editing other structures can result in inconsistent data and is not recommended.
The current compatibility guide suggests compatibility with the specification is tied to supporting a wide set of features. I'd rather define it in terms of the alignment between whatever features an application supports and the files they read and write.
As a discussion proposal, perhaps we could define something like the following
The FamilySearch GEDCOM 7 specification contains more than 150 standard structure types appearing in more than 1000 contexts, and many family history applications use only a subset of them. Additionally, many applications implement features that are not (yet) part of the specification. Because of this, compatibility with the specification is dependent on the features implemented by a given application.
The following compatibility categories are defined.