variant representation - Githubissues

cbizon commented 7 years ago

As part of handling ids in the transformation, I've implemented looking up canonical ids in the allele registry. It would be simple to hit the allele registry to get a JSON message back about every variant in a message, and we therefor have access to any of the information that we might want.

We have to decide how much allele information we want to represent in the interpretation. At the moment, I'm just putting the canonical allele id and type. We can go from there all the way up to the full json you'd get back for a variant, like you'd get from e.g.

http://reg.genome.network/allele/CA000301

I'm thinking maybe just one or more of the HGVS, but I'm not stuck on that.

tnavatar commented 7 years ago

Are you thinking about requiring more than the bare IRI for an interpretation to be valid?

cbizon commented 7 years ago

Not for validity - I think that just an IRI is sufficient. But for practicality/readability I think we want the first time a variant appears to have something more.

tnavatar commented 7 years ago

So this would be more of an implementation decision, and less a part of the schema/spec?

cbizon commented 7 years ago

I would like for the schema to say that the value for the variant key must be either a bare IRI or a structure of our devising, preferably one that conforms to the allele model, and then leave it up to implementers to decide which of those they would like to use in which part of the JSON.

tnavatar commented 7 years ago

I generally agree. JSON Schema includes a restriction that a string must be a URL. The only thing I don’t (think) we can do with JSON schema alone is support true IRIs, or shortened IRIs with curie: (i.e. “cgallele:CA12345”). Maybe we can specify this restriction in an OWL, instead.

On Apr 20, 2017, at 8:33 AM, cbizon notifications@github.com wrote:

I would like for the schema to say that the value for the variant key must be either a bare IRI or a structure of our devising, preferably one that conforms to the allele model, and then leave it up to implementers to decide which of those they would like to use in which part of the JSON.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/clingen-data-model/clingen-interpretation/issues/70#issuecomment-295719736, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWMFsseDOOGN6UrnK2tVzbdS7r3gSNks5rx1CWgaJpZM4NCIRV.

larrybabb commented 7 years ago

I also agree with @cbizon. Is this one of those places where we'd use the blank node or blank IRI merely to provide a message-unique id that is basically transient?

see Identifying Blank Nodes

cbizon commented 7 years ago

@larrybabb : I don't think so? My current hypothesis is that if something has an externally referenceable id, we should use it. variant has the allele registry id, so let's use it.

The alternate case is something like our MendelianCondition or a bit of SegregationData. Those don't map cleanly to something that has a managed external ID, so we would use a blank node type IRI to refer to them.

larrybabb commented 7 years ago

@cbizon my misunderstanding. I missed the "or a structure of our devising" bit in your proposal above. I do think we should focus on a few examples/use cases and work through how we are going to do this in a pragmatic way so that others don't have too high a bar to either produce or consume this message.

cbizon commented 7 years ago

I think that the simplest thing is probably to reproduce a subset of the data from the allele registry. It's based on the allele model, and it already exists. So it would be a question of deciding how much of it we want to when we have our "full" variant representation.

clingen-data-model / clingen-interpretation

variant representation #70