clingen-data-model / clingen-interpretation

Allele (variant) interpretation model and API for ClinGen
3 stars 1 forks source link

Can we just make the `id` what the `@id` is for DomainEntities, where available? #135

Closed bpow closed 3 years ago

bpow commented 6 years ago

Maybe not much of a sensible title for the issue, but...

@larrybabb recently put in @id columns, some as blank nodes, but some represent reasonable (shortened) iris. For instance, MCTID018 has @id SO:0001824. Can't we just use that for the id (which has been being made into a json-ld @id anyway in our documentation...)?

bpow commented 6 years ago

I had to replace the references to the following external ids with the internal ids below in _ContributionAttribute for now so that references within the examples work, but if we replaced our internal ids with the external ids, then we could switch back...

internal id external id
CRID193 SEPIO:0000154
CRID194 SEPIO:0000156
CRID195 SEPIO:0000155
CRID196 SEPIO:0000331
cbizon commented 6 years ago

The problem, as I understand it, has to do with the translation to the relational nature of the spreadsheets from the RDF style of our JSON.

In particular, we started with a domain entity, many of which will have an external ID defined, but now we want to let somebody add their own label to that thing for THAT message. The problem is that our sheets didn't have a concept of instance 2 of ID whatever, and that the name should only be applied in that case.

So Larry added this ID which tags an instance (of something that's already an instance (my head hurts) ).

Again, this is largely because we are pushing something that's RDF-style into a spreadsheet and we're starting to stick our fingers in the gears. Thinking a little more about it, I think that maybe what we (ha) should do is use those id's to differentiate different 'versions' of the object in the spreadsheet, but then know that when they get written to the examples, the better ID gets serialized. That is, this ID is just an extra value for the spreadsheets, it need not propagate?

That's not what Larry and I discussed, but it might be simpler...

larrybabb commented 6 years ago

I agree with @cbizon's characterization and suggestion above.

I meant to remove the "identifier" property off of canonicalAllele. In essence, "identifier" equals "@id". Since we are using a json-ld message-style model here (not a persistent UML class diagram style) these are referenced differently (IMO).

As Chris mentions above, our ID (from the spreadsheets) represents a persistent-type id in that it captures a specific instance of a record that represents the use of an @id (iri). Since a single @id value can be used multiple times, we can not use the @id as our ID otherwise we could not differentiate between one record and another that uses the same iri.

bpow commented 6 years ago

So what should happen when creating the json for our examples? (asking for a friend...)

Should I use the @id if it is present, and leave it as blank or create a 'blank' identifier if absent? That would make our internal ids be actually internal. It looks like none of the other data point to the @id values, is that right?

larrybabb commented 6 years ago

I think we should use it if present, I also think we can have the spreadsheet control when to create a blank one and when to just do nothing. However, I will defer to your judgement on whether it is best to have the script control when and when not to include the "internal" ids.

larrybabb commented 6 years ago

@bpow can you close this if it is done, otherwise restate where we stand on this?