Closed nsheff closed 4 months ago
Here's some proposed text to add to the spec:
sorted_sequences
attribute (RECOMMENDED
)The sorted_sequences
attribute is a non-inherent attribute of a seuqence collection, with a formal definition.
We RECOMMEND
all implementations provide this attribute.
When digested, this attribute provides a digest representing an order-invariant set of unnamed sequences.
It provides a way to compare two sequence collections to see if their sequence content is identical, but just in a different order.
Such a comparison can, of course, be made by the comparison function, so why do we recommend this attribute be included as well?
Simply that for some large-scale use cases, comparing the sequence content without considering order is something that needs to be done for
In these cases, using the comparison function could be computationally prohibitive. This digest allows the comparison to be pre-computed, and more easily compared.
Algorithm:
sequences
attribute and canonicalize the JSON (using RFC-8785).sorted_sequences
attribute, non-inherent and non-collated.What was the decision on this? Add to the spec?
Our decision on this was to make this an OPTIONAL
and for now include it in the spec.
In the future if the number of proposed ancillary attributes grows, it could move to a separate document together with other ideas for ancillary attributes.
ADR added, added to spec.
Some feedback from the PRC was that we could think about another RECOMMENDED non-inherent attribute to live alongside
sorted_name_length_pairs
, that would be a digest for the sequences that does not respect order. So, something like:sorted_sequences
.This digest would allow you to easily assess order-invariant equivalence of sequences without having to use the comparison function, which would be useful for some use cases.