FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
354 stars 68 forks source link

Permission Rights of GEDCOMX files #151

Closed EssyGreen closed 12 years ago

EssyGreen commented 12 years ago

How are the rights/permissions intended to be managed for GEDCOMX files given the needs for (a) publishers to restrict editing to preserve their data and copyright vs (b) researchers to control/edit/document their own source media.

For example, if I have a file with an image copy of a source which I want to add my own transcription/interpretations to then will I be physically able to do this or will I be forced to create a derivative source which references the original? Similarly, if I receive an file from say Ancestry which has an image copy and a bad transcription, will I be physically able to correct that within the file or will I be forced to create a derivative and if so how do I separate the image copy which I want from the bad transcription which I want to throw away?

jralls commented 12 years ago

For example, if I have a file with an image copy of a source which I want to add my own transcription/interpretations to then will I be physically able to do this or will I be forced to create a derivative source which references the original?

I don't know if "forced" is the right word, but you should create a derivative source if you're going to publish it -- which I think would include transmitting it in a GedcomX. Same for the Ancestry case.

Depending on the image, it might be a violation of Ancestry's copyright to share it. It is absolutely a violation of their copyright to share their transcription without their express written permission. Most archives assert copyright over their holdings as well, so while they might permit you to photograph or photocopy documents during your visit, those copies are for your personal use and shouldn't be published without permission unless you're up for testing their assertion of copyright in court. (I'm no lawyer and that isn't legal advice.)

More generally, I think that it will be useful for GedcomX to provide for including images, recordings, and transcripts. It is each users responsibility to respect the copyrights of others, just as it is with any other electronic file format. Yes, this is a severe problem in the, um, less professional strata of the genealogical community. It's not one we'll be able to fix with a file format specification.

EssyGreen commented 12 years ago

I don't know if "forced" is the right word, but you should create a derivative source if you're going to publish it

Yes I agree - was trying to simplify but let's assume in my example that I have no intention of publishing. It's just for my own research. If you like, my own Notes, but in a structured form.

My point being that a researcher should always be able to annotate, transcribe, interpret etc to clarify and help their research and enable analysis of sources in a way which they need. If the GEDCOMX file might disable editing and hence prevent that, then there is little point in the application building in the capability to edit GEDCOMX files in the same way that I wouldn't build in the capability to edit PDF files or JPG from within a genealogy application - the best I would do is read their meta-data.

I'm not arguing against this, I'm just trying to establish what is intended so I don't build my expectations on the researcher being able to annotate the Record and then find out further down the line that in 90% of cases it is read-only.

EssyGreen commented 12 years ago

I guess what I'm struggling with here is the different types of application/use and users I tried (and failed) to identify in #142 ... It seems to me from what Ryan said regarding the need for FamilySearch and BrightSolid to proceed quickly with the Record Model (see #138) means that it will probably go ahead more or less as is (maybe with a few tweaks) so there is little point in me banging on about what I would like to see in it 'cos it ain't gonna happen. The Record Model is primarily there for the web publishers to push to their users. To this end there will be some sort of free reader out there which enables people to view GEDCOMX files. So from my personal perspective I don't need to bother with deserialising these files (in this situation) - I can just treat them like any other media file.

If I want to enable users to publish/push to these same web-publishers then I need to serialise in whatever way they are specifying - which I presume will be the Conclusion Model.

That's all fine and dandy but what I personally am more interested in is the interoperability between different research applications where I need to both serialise (export) and de-serialise (import). If the import file (or a file referenced by/embedded in it) is protected then I shouldn't be de-serialising it at all (since I could then re-construct it and plagiarise the original) but if it is the researcher's own work then I should de-serialise it in order to extract the most detailed information.

... My thoughts on this aren't clear ... I'm trying to get my head round what I need to serialise/de-serialise in order to support application interchange whilst maintaining the rights of the original publisher/owner. And as part of that how to identify how GEDCOMX will separate the researcher's own work from that of others within the file.

jralls commented 12 years ago

Serialization is skew to this issue: Serialized binary looks like a very long string of random characters. See Base64 for the gory details. For Mime Multipart or XML CDATA you must serialize binary content because both are text formats. If Ryan adopts #140, then the binaries just get zipped up into the archive as-is, no serialization required. Serialization of program objects to XML is still necessary, of course.

EssyGreen commented 12 years ago

Serialized binary looks like a very long string of random characters.

Yes, I know, but that wasn't really my point.

Serialization of program objects to XML is still necessary, of course.

Indeed and it is this I was really talking about. Apologies for not being clear. My questions (with the exception of how to extract an image copy) were about the XML data ie the Record Model objects held within the GEDCOMX file.

jralls commented 12 years ago

My questions (with the exception of how to extract an image copy) were about the XML data ie the Record Model objects held within the GEDCOMX file.

That will be up to the application. XML is totally mutable and XML parsers create mutable objects.

EssyGreen commented 12 years ago

That will be up to the application. XML is totally mutable and XML parsers create mutable objects.

I understand that. I'm obviously not doing well at explaining here. I'll try again ...

  1. Is it the intention (a) for the GEDCOMX to be in some way protected so that the Record objects cannot (or should not) be changed by the recipient. Or (b) is it intended that the Record objects are for the use/responsibility of the researcher and hence can be overwritten as they see fit?
  2. If 1(a) then how will this be achieved if the ZIP-type approach is used? And if the MIME approach is used with an embedded image say, then how/will the researcher be able to separate the image copy from the interpreted data to either correct bad data or add their own interpretation(s). (This latter is partly related to my earlier post re how derivatives will be catered for - see #136)
  3. Furthermore, if I (as a researcher) am scanning and packaging in GEDCOMX format say a birth certificate (Crown copyright) how do the rights of the Crown get maintained? Or should this not be done?

It's not so much the technicalities of the format that I'm trying to get to grips with here it's what is intended by GEDCOMX, what their objectives are. Are web-publishers going to be giving us black-boxes or editable media? Are users going to perceive these downloadable GEDCOMX files as something akin to a JPG/PDF or as something more like a spreadsheet/database? Regardless of technical abilities, will these files come with a "Do not copy - for your use only" message or with a "Here it is scribble on it as you like" message.

jralls commented 12 years ago

Obviously I can't answer the intent question. I can only say that the file format as presently defined doesn't lend itself to any sort of control.

stoicflame commented 12 years ago

@EssyGreen does the provision in the file format specification for digital signatures address the issues you'd like to see addressed here?

EssyGreen commented 12 years ago

Yes I think so - many thx :)