clingen-data-model / clingen-interpretation

Allele (variant) interpretation model and API for ClinGen
3 stars 1 forks source link

Contribution ID / Agent ID #66

Closed cbizon closed 6 years ago

cbizon commented 7 years ago

Is the @id tag in our contribution element the ID of the contribution or of the agent? I assume the former. So I guess that instead of something like this:

"contribution": [
                    {
                      "cg:id": "Contrib488",
                      "cg:type": "Contribution",
                      "agent": "Harry Stevenson",
                      "onDate": "2016-06-01T14:15:00+00:00",
                      "role": "CG-contributory-role:curator"
                    }
                  ]

we should do this:

"contribution": [
                    {
                      "cg:id": "Contrib488",
                      "cg:type": "Contribution",
                      "agent": {
                                 "name": "Harry Stevenson"
                                 "@id:": "http://vci.clinicalgenome.org/users/140751820-12830283"
                      }
                      "onDate": "2016-06-01T14:15:00+00:00",
                      "role": "CG-contributory-role:curator"
                    }
                  ]
bpow commented 7 years ago

My understanding is that the id here is the id of the contribution. @larrybabb had advocated for (and went ahead and made corresponding changes to the sheets document) changing the "Agent" type from a formal referable type to a plain-text string. I'm with you that we should encourage something more formal (maybe use something like an ORCID, or at the least a codeable concept that refers to some mapping to person/organization).

cbizon commented 7 years ago

There are a couple of reasons that I think we need an ID

1) VCI has an ID, and we can piggyback on that

2) There are more than one person with the same name. Karen told me that in fact, the VCI has already run into this situation!

cbizon commented 7 years ago

BTW, I apologize if I agreed to this change in the past and am now flip-flopping. (but I am)

rrfreimuth commented 7 years ago

(using IDs for contributors)++

IMO, IDs would increase specificity and computability, without adding too much of a burden on submitters.

From: cbizon [mailto:notifications@github.com] Sent: Monday, April 17, 2017 1:29 PM To: clingen-data-model/clingen-interpretation Cc: Subscribed Subject: Re: [clingen-data-model/clingen-interpretation] Contribution ID / Agent ID (#66)

There are a couple of reasons that I think we need an ID

  1. VCI has an ID, and we can piggyback on that

  2. There are more than one person with the same name. Karen told me that in fact, the VCI has already run into this situation!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/clingen-data-model/clingen-interpretation/issues/66#issuecomment-294552353, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALAhBvcxzfNONyhUWVZWdFo75nPUyYg2ks5rw697gaJpZM4M_YmR.

bpow commented 7 years ago

Embracing linked data, we could use foaf:Agent (which has subclasses foaf:Person, foaf:Organization and foaf:Group) for our agent type, and as identifier could use an OpenID "identity URL" for the identifier.

larrybabb commented 7 years ago

Please point me to some OpenIDs or VCI ids for reference and I will add them back to the Agent concept. I'm fine with having ids on Agents. Is this a required attribute?

cbizon commented 7 years ago

I think it should be required. I don't know that we need to specify the form of the ID, but using foaf:Agent as the class makes sense to me.

VCI ids are of the form: {BASE_URL}/users/{uuid}

I don't know what the BASE_URL is.

cbizon commented 7 years ago

After looking a bit more, I'm on board with using foaf:Agent

cbizon commented 7 years ago

Or schema.org:Person, which is maybe more common now?

larrybabb commented 7 years ago

just a gut feel, but Person seems more flexible.

bpow commented 7 years ago

I actually think schema:Person is less flexible in at least on respect: foaf:Agent has subclasses for foaf:Person, foaf:Organization and foaf:Group.

Assuming that we would want to have a Contribution signed off by a group or organization, it might be easier to just say the domain is foaf:Agent and let downstream use the appropriate subclass. I know that Romney would tell us that "Corporations are people", but from a modeling standpoint, I don't think it makes sense to say that, for instance, the Cardiomyopathy working group would be a schema:Person.

So if we go with schema.org, it seems like we would need to say that the agent of a contribution must be one of schema:Person or schema:Organization (there is no unifying superclass of these below schema:Thing). Any ideas how we would handle this in validation?

bpow commented 7 years ago

Or we could just not specify this directly and handle this in the linked-data layer...

cbizon commented 7 years ago

I'm not 100% sure about how this works, but the problem is that there is a mismatch between the conceptual side and the JSON-LD side.

On the conceptual side, modeling as foaf:agent makes perfect sense because of subclassing. However, as far as I can see, that subclassing doesn't extend into the JSON-LD world. That is, when you go to write a JSON schema, you actually have to write that your agent is oneOf { "person", "organization", "group" }. Then the validator can check it off, otherwise it cannot.

So in terms of the validation, we have to write the same construct for either case. In terms of the conceptual model, things are simpler with foaf.

One interesting set of comments is here: http://baskauf.blogspot.com/2016/02/rdf-for-talking-about-people.html in the section "FOAF vs. Schema.org". The writer speculates that schema.org is the one that is going to be more used going forward (though he may be wrong), and also points out the equivalence of foaf:person and schema:person.

tnavatar commented 7 years ago

In the linked data world, you can’t assume that you know the set of all possible subclasses of a given class, so it’s technically incorrect to limit this to merely a ‘person’, ‘organization’ or ‘group’. I think it makes more sense to limit this to a named node (an IRI or an object with an ID that is an IRI), and assume the person using the data model understands enough to use this in a semantically correct way.

On Apr 20, 2017, at 9:04 AM, cbizon notifications@github.com wrote:

I'm not 100% sure about how this works, but the problem is that there is a mismatch between the conceptual side and the JSON-LD side.

On the conceptual side, modeling as foaf:agent makes perfect sense because of subclassing. However, as far as I can see, that subclassing doesn't extend into the JSON-LD world. That is, when you go to write a JSON schema, you actually have to write that your agent is oneOf { "person", "organization", "group" }. Then the validator can check it off, otherwise it cannot.

So in terms of the validation, we have to write the same construct for either case. In terms of the conceptual model, things are simpler with foaf.

One interesting set of comments is here: http://baskauf.blogspot.com/2016/02/rdf-for-talking-about-people.html http://baskauf.blogspot.com/2016/02/rdf-for-talking-about-people.html in the section "FOAF vs. Schema.org". The writer speculates that schema.org is the one that is going to be more used going forward (though he may be wrong), and also points out the equivalence of foaf:person and schema:person.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/clingen-data-model/clingen-interpretation/issues/66#issuecomment-295730515, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWMME7v-eM6t7ViZazNgdseSpmKgM3ks5rx1fJgaJpZM4M_YmR.

cbizon commented 7 years ago

I'm a little unclear what you are saying @tnavatar . Are you saying that it's bad form to write a schema that limits the value to one of those types, or are you saying that a schema written in terms of a superclass won't work, or that validation should not cover this situation? Sorry for the confusion...

tnavatar commented 7 years ago

Kind of, to break up your question and answer to the best of my knowledge:

Are you saying that it's bad form to write a schema that limits the value to one of those types

A bit. Even if it’s valid to say that this field can only be filled by one of a set of classes (or the subclasses), it’s not correct to assume that you know what all the possible subclasses are (RDF is based on an open-world assumption).

are you saying that a schema written in terms of a superclass won't work

JSON Schema has no knowledge or access to a class structure, so this won’t work using JSON Schema alone (unless we impose a closed set of classes). OWL makes this statement a little easier to structure, but would impose the same closed set of classes (which isn’t good form, at least in semantic web land).

that validation should not cover this situation?

Maybe this has been said before, and I missed the boat, but I think we might have two sets of schema: the first would be a contract for what we are going to produce from the VCI, the second, and more permissive, would define the minimum valid data required to use the message format. For the former, using validation seems good, for the latter, it doesn’t seem correct to me.

On Apr 20, 2017, at 9:28 AM, cbizon notifications@github.com wrote:

I'm a little unclear what you are saying @tnavatar https://github.com/tnavatar . Are you saying that it's bad form to write a schema that limits the value to one of those types, or are you saying that a schema written in terms of a superclass won't work, or that validation should not cover this situation? Sorry for the confusion...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clingen-data-model/clingen-interpretation/issues/66#issuecomment-295738609, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWMBaFgvHPvppxv5uw5ChqRXCzbomyks5rx12YgaJpZM4M_YmR.

bpow commented 7 years ago

I want to bring this issue back to the forefront, since we haven't discussed for a while, and I notice that @tnavatar has Contribution.agent as a isReference == True, but all of the examples just have a person's (fictitious) name down in this field. Clearly a name is not enough, since a person's name does not uniquely identify a contributor. Tagging @larrybabb as well so he will notice.

larrybabb commented 7 years ago

OK, I added an Agent & _AgentAttribute sheet to provide the Agent structure a name attribute and I moved the string within it so that the contribution.agent now refers to the Agent object.

Please take a look and see if that helps create a structure that will provide the ablility to put an @id on it?

Question: where/how do we put the id values in the spreadsheet for these agents? Is this some type of scripting magic or should I create a new attribute off of agent that is a String that is marked as "isReference=True"?

cbizon commented 6 years ago

Settled on PROV:Agent