ga4gh / refget

GA4GH Refget specifications docs
https://ga4gh.github.io/refget
14 stars 7 forks source link

Should we prefix the digests that we return from seqcol? #53

Open nsheff opened 1 year ago

nsheff commented 1 year ago

In issue #37 we raised the point of "to prefix or not to prefix", but there were really 2 issues being discussed there:

  1. Should seqcol prefix digests that become part of the seqcol representation that is digested?
  2. Should seqcol prefix the seqcol digests it generates? And should it require these for queries?

Issue #37 discusses the first issue, which we decided and posted an ADR for (the upshot is we don't add anything specifically, but if an external protocol like refget specfies that such and such prefix is actually part of the identifier, then clearly we just take that at face value).

This issue is meant to track the 2nd point: Should seqcol prefix the seqcol digests it generates? And should it require these for queries?

what do we want to accept in the API? with or without prefixes? what does the server serve? the output provided to the user. Do we have to say that these strings have to be prefixed with something? When we return things, do we include these prefixes? Or do we make it user-controlled through query parameters or something?

My current thinking is that the answer should be No.

I think we should never add or expect prefixes. They are for entities that surround the spec, not for the spec itself.

waterflow80 commented 1 year ago

I was just wondering, are we talking about the SQ. prefix or about the ga4gh: prefix for the sequences digests (which I think was resolved here in PR #42) ?

And I was also wondering whether the SQ. prefix is a part of the ga4gh checksum algorithm, as mentioned in the refget spec, or it's a Refget required prefix to retrieve ga4gh digested sequences ?

nsheff commented 1 year ago

are we talking about the SQ. prefix or about the ga4gh:

Both.

And I was also wondering whether the SQ. prefix is a part of the ga4gh checksum algorithm, as mentioned in the refget spec, or it's a Refget required prefix to retrieve ga4gh digested sequences ?

I believe it is: 1) not part of the checksum algorithm, but 2) it is a refget required prefix to retrieve ga4gh digested sequences.

nsheff commented 5 days ago

A year later, where we've come on this:

I believe we've come to agreement that we should neither prefix responses returned, nor expect prefixes on queries sent to the server. The idea of these kinds of prefixes are external to this spec, and should therefore be handled by the external service or environment or context in which this spec is used (eg the greater ga4gh ecosystem as a whole).