FamilySearch / GEDCOM

Apache License 2.0
163 stars 21 forks source link

RESN values unclear #280

Closed jl5000 closed 1 year ago

jl5000 commented 1 year ago

The way that RESN is used is not clear. In particular what is the practical difference between PRIVACY and CONFIDENTIAL? If something is marked PRIVACY, is it also CONFIDENTIAL (or vice-versa)? Or are they completely independent like LOCKED?

I'm assuming programs exporting files should give the option whether to write records with PRIVACY and/or CONFIDENTIAL restrictions?

fisharebest commented 1 year ago

In particular what is the practical difference between PRIVACY and CONFIDENTIAL?

In my (web-based) application:

If something is marked PRIVACY, is it also CONFIDENTIAL (or vice-versa)?

I treat CONFIDENTIAL as "more private" than PRIVACY. So CONFIDENTIAL implies PRIVACY, but not vice-versa. A record can have only one of these two restrictions.

LOCKED is independent from the others.

I'm assuming programs exporting files should give the option whether to write records with PRIVACY and/or CONFIDENTIAL restrictions?

Mine does. The export can apply these privacy filters when creating a GEDCOM file. So you can export the full file, or a public subset of it.

tychonievich commented 1 year ago

When we polled applications when developing 7.0 we found some that interpreted them like @fisharebest explains and others that interpreted them as independent, allowing users to decide if they want to export data that is marked PRIVACY and separately decide if they want to export data that is CONFIDENTIAL. Because of that we didn't know how to clarify their meaning in the specification.

5.5.1 had these longer descriptions:

value meaning
confidential This data was marked as confidential by the user. In some systems data marked as confidential will be treated differently, for example, there might be an option that would stop confidential data from appearing on printed reports or would prevent that information from being exported.
locked Some records in Ancestral File have been satisfactorily proven by evidence, but because of source conflicts or incorrect traditions, there are repeated attempts to change this record. By arrangement, the Ancestral File Custodian can lock a record so that it cannot be changed without an agreement from the person assigned as the steward of such a record. The assigned steward is either the submitter listed for the record or Family History Support when no submitter is listed.
privacy Indicate that information concerning this record is not present due to rights of or an approved request for privacy. For example, data from requested downloads of the Ancestral File may have individuals marked with ‘privacy’ if they are assumed living, that is they were born within the last 110 years and there isn’t a death date. In certain cases family records may also be marked with the RESN tag of privacy if either individual acting in the role of HUSB or WIFE is assumed living.

It wasn't clear to me that the longer text added any clarity: it defines privacy by example of how one application assigns it and confidential by example of two tasks that might be changed based on its presence. I'm not confident there is consensus on how the two are related

jl5000 commented 1 year ago

One of the main things I glean from that extra information is that PRIVACY indicates that some information has already been redacted.

Data privacy is a topic which has surged in importance in recent years. Whilst there aren't any international standards to fall back on that I know of, it does suggest that this area probably needs a bit more attention.

jl5000 commented 1 year ago

Noting your comments on the pull request, I think it's a good thing being able to explicitly mark records as sensitive, and would be sad to see RESN deprecated.

dthaler commented 1 year ago

"RESN privacy" and "RESN confidential" should be deprecated since there are already multiple incompatible meanings deployed. Agreed that there is a need to mark things as sensitive. We could do so either by defining a new tag, or simply by defining new values with the existing RESN tag.

tychonievich commented 1 year ago

I think the easiest would be to depreciate PRIVACY and recommend using CONFIDENTIAL instead. Although the difference between the two varies, I'm not aware of differences in how people use CONFIDENTIAL itself; the variation seems to be in PRIVACY.

Norwegian-Sardines commented 1 year ago

I think the easiest would be to depreciate PRIVACY and recommend using CONFIDENTIAL instead. Although the difference between the two varies, I'm not aware of differences in how people use CONFIDENTIAL itself; the variation seems to be in PRIVACY.

I would disagree that there is no difference between PRIVACY and CONFIDENTIAL.

CONFIDENTIAL by definition is user/data entry person based, they may not want others in a shared environment to see a particular fact/event or an individual due to some personal or family controversy or situation. A birth name, person born out of wedlock, or an untested person could be an example.

PRIVACY by definition has to due with not allowing an individual to be named or seen. GEDCOM says “ if they are assumed living, that is they were born within the last 110 years and there isn’t a death date.”. Some countries don’t even allow information for a person to be seen in public even if they are deceased for a set number of years. In a shared environment this is important.

tychonievich commented 1 year ago

I would disagree that there is no difference between PRIVACY and CONFIDENTIAL.

I wasn't trying to say they are not different. CONFIDENTIAL has a consistent usage, but PRIVACY does not. We know of the following meanings for PRIVACY, all arguably in keeping with the 5.5.1 and 7.0 defining text:

We could clarify to any one of these, but whichever we pick would be re-defining some current tool's usage and thus changing the meaning of existing data. I think the best we can do for now is note that PRIVACY has various meanings and add new unambiguous tags in the future. For example, we might add something like

I'm not saying those are the right tags or right definitions, but I am saying they are different and that lumping them all under PRIVACY is problematic.

Norwegian-Sardines commented 1 year ago

Ok, my bad I misunderstood your statement.

I would definitely agree that: 1) we need at minimum 2 states of display control 2) the current values are not self defining 3) we need some state that control “update ability”

I like the idea that data can be locked to update. In a shared environment two data entry people could be at odds about a piece of data or a relationship. By group decision the data/relationship can be locked to prevent changing or indicating to the receiver that some decision has been made.

Data should also have a set of states where it can be publish or seen by selected individual, or public. These states would be defined and used both in shared environments and when data is transmitted via GEDCOM. This could be as indicated either hidden from all users (HIDE), or hidden from the public (POLICY). The FILTERED confuses me! On export it would indicate the data was filtered out, but if you imbed it in a GEDCOM for let’s say the birth info would the GEDCOM look like this: 1 BIRT 2 RESN FILTERED

For adoption: 1 ADOP 2 RESN FILTERED

This could give the receiver of the GEDCOM an indication the individual was adopted just without the data. This could be a breach of the intent of filtering out the data! I’d rather “hide” the data and not export it.

tychonievich commented 1 year ago

The FILTERED confuses me!

I don't use anything like FILTERED myself, but could imagine it being useful to assist in collaboration. If I send a file to my cousin, then get it back where some details about one of our ancestors are missing but there's a RESN FILTERED in that ancestor's INDI record, I would be more likely to assume the omission was based on privacy, not based on my cousin disagreeing with the data and removing it as inaccurate.

That said, the reason I included it at all was not because I think it is particularly useful but because it is a reasonable reading of what the 5.5.1 spec intended for PRIVACY, and hence may be something that PRIVACY means in the wild.

Norwegian-Sardines commented 1 year ago

Yes I can see the value of the definition you prescribe for RESN FILTERED at the INDI level.

0 @X2@ INDI 1 RESN FILTERED

As a data placeholder in a transmitted GEDCOM and that you wanted to keep that individual private, but still in the shared GEDCOM so that a XREF renumbering did not eliminate them when the GEDCOM was returned.

My point was to point out that RESN can also happen on facts and RESN FILTERED had issues.

The other states of RESN allow for the transmittal of data, but instruct the receiver to treat the data with care when they transmit it themselves or print a report (no guarantee they will).

However, if I was transmitting to another person I’d think 3 or 4 times before transmitting RESN HIDE or RESN POLICY data for fear that the recipient would not honor their meaning. I’d probably RESN FILTERED the Individual or remove the facts, but would be my privacy bias.

chronoplexsoftware commented 1 year ago

The two use cases we have for the restriction structure is

  1. to protect against accidental edits of data (we call this read-only)
  2. to protect against accidental export of what the user defines as private data (we call this private)

When writing GEDCOM for the first case, we have used the RESN LOCKED. For the second case, we had been using RESN PRIVACY but have now switched to RESN CONFIDENTIAL following the discussions here. We assume that there is no restriction on the data unless a RESN is tag is present. Of couse, with GEDCOM 7, we can export both restrctions at the same time, something we could not previously do using standard GEDCOM. Our GEDCOM 5.5.1 writer, uses a custom tags to work round that limitation.

We consider three kinds of export:

1. Exports where the user wishes remove data based on a filter. Owing to the large numbers of persons in users' trees, it is not reasonable to expect users to explicitly set a restriction on all persons. Instead, we provide the option to use an algorithm to filter data of persons known or likely to be living, retaining their cross references to other individuals.

2. Exports where the user wishes to take action based on a restriction. This scenario requires the specification to clearly define what the restriction value means. It may also need to define the action that must be taken upon encountering that restriction during this type of export.

For example, wherever data would have a RESN CONFIDENTIAL or RESN PRIVACY, we currently offer the user one option: to completely remove this data from the export leaving no trace. For GEDCOM exports, we do not use @VOID@ pointers for this type of data removal. We consider that as a leak of information; a FAM record containing a CHIL @VOID@ could indicate that a person had a child, something that may be confidential. If the previously linked INDI record is marked RESN CONFIDENTIAL we would not export any CHIL payload for that in the associated FAM.

This is one possible and valid action to take. Others would refer to the sections in the specifcation which discuss data removal and take a different view. None the less there needs to be clear consensus on what actions should be taken for each restriction value.

3. Exports where the user simply wishes to transfer their data to another app or service, preserving restictions but not taking any action based on them. Here we would not make any changes to the underlying data, and we essentially port the data to a different format, retaining the restriction values where that format supports the concept of restrictions.

Two other considerations in future designs

How should inheritance be handled? There have been suggestions to add 'RESN' to several other structures including names and citations. This quickly leads to questions about inheritance. For instance this will mean that INDI can have BIRT, NAME, and all can have citations, each with separate restriction(s). At the moment, we lock, or treat as confidential, any data item that has a 'parent' item marked locked, or confidential respectively. This has the advantage of allowing a user to set one flag, most often on a person, and have their names, facts, and citations treated the same way without the user needing to manually tag each 'child' item. However, I expect that others will have different ideas about how that should work.

What about conflicting restrictions? Whilst the LOCKED and CONFIDENTIAL case works well as a list, if new values such as FILTERED, and HIDE enter the mix we start to get problems where the payload contains mutually incompatible values.

fisharebest commented 10 months ago

I can't speak for other applications, but in my own, LOCKED means that only a user with elevated privileges (e.g. a site administrator) can edit this record.

This behaviour is based on the original definition for this tag.

Some records in Ancestral File have been satisfactorily proven by evidence, but because of source conflicts or incorrect traditions, there are repeated attempts to change this record. By arrangement, the Ancestral File Custodian can lock a record so that it cannot be changed without an agreement from the person assigned as the steward of such a record. The assigned steward is either the submitter listed for the record or Family History Support when no submitter is listed