ga4gh / vrs

Extensible specification for representing and uniquely identifying biological sequence variation
https://vrs.ga4gh.org
Apache License 2.0
80 stars 34 forks source link

Should SequenceReference be a Ga4ghIdentifiableObject? #479

Closed Mrinal-Thomas-Epic closed 3 months ago

Mrinal-Thomas-Epic commented 7 months ago

I saw that SequenceReference has an alternate digest convention, but I think it doesn't actually define a digest property or inherit from a type that defines one.

ahwagner commented 7 months ago

It should have a keys parameter for use with Ga4ghIdentifiableObject, but it does not create its own computed identifier. Convention should be updated to reflect #465.

larrybabb commented 4 months ago

@ahwagner I need some clarification on this. The keys parameter is used to define the attributes of a value object that are then ingested into the algorithm to generate a new computed key (with json structure intact of course). So I don't think we want SequenceReference to have a new computed digest, but instead have some specialized logic that passes through the digest of the Sequence. If we are saying that we DO want SequenceReference to have a different digest then I would definitely have the type attribute be included so it works like all other ValueObjects. If not, and we don't want to have id collision by using the same digest for both Sequence and SequenceReference (maybe better named SequencyProxy?) then we should maybe just re-prefix it so that we have ga4gh:SQ.<seq digest> for Sequence and ga4gh:SR.<seq digest> for SequenceReference.

larrybabb commented 4 months ago

@ahwagner ... @korikuzma and I decided to revert the April commit where you removed the assigned value and replaced it with keys. This change would cause SequenceReference to get a new computed id and that would propogate up to SequenceLocation and above. We do NOT want any changes to identifiers as a result of whatever option we land on here. So for now, we are reverting until we can sort this out.

korikuzma commented 4 months ago

@larrybabb See #514 . I think this helped resolve my confusion.

korikuzma commented 4 months ago

Okay, I think we're good but it might be nice to meet about this once @ahwagner gets back to make sure we're all on the same page.

ahwagner commented 4 months ago

This was discussed at length in #465 and there was clear consensus we did not want special cases. Reverting that work goes against the consensus decision.

I am on mobile so might be misunderstanding. Please read through above discussion and confirm that you still want to revert to prior model; if so I will revisit when I return.

korikuzma commented 4 months ago

@ahwagner I reverted the initial revert https://github.com/ga4gh/vrs/compare/2018884...d23c4af

From your work in #485 , I added type to ga4ghDigest.keys since in #465 you stated :

the digest keys for SequenceReference are limited to type and refgetAccession

korikuzma commented 3 months ago

Can we close this?

Mrinal-Thomas-Epic commented 3 months ago

I think so. To confirm my understanding, SequenceReference should not have a digest property, just a refGetAccession. During SequenceLocation ID calculation, all properties except for those in keys will be removed from the SequenceReference (but it will be kept as a whole JSON object). Is that correct?

korikuzma commented 3 months ago

@Mrinal-Thomas-Epic Correct, SequenceReference will not have a digest property since it is a ValueObject and not a Ga4ghIdentifiableObject.

Yes! I like looking at this example from the validation models.

Mrinal-Thomas-Epic commented 3 months ago

Makes sense to me!