Closed garethsb closed 1 year ago
FWIW, Kubernetes annotations have a limit of 63 characters for the name (seems broadly equivalent to the part after the urn:x-nmos:tag:user:
prefix for our user tag names). According to this post there isn't a per-value limit, only a limit per resource of 256 kB. Certainly the latter obviously isn't appropriate for resource-constrained devices!
An absolute minimum requirement would be for a Controller or Tool to be able to implement its own annotation scheme; i.e. set the label of a resource to some unique value, imposing a minimum number of characters like 64 for the label and have a flag indicating if a resource has been annotated. From there a Controller / Tool can have its on database for annotations based on the unique label and flag.
I'm not a fan of putting too much requirements on the device. It seems that allowing the Controller to do the annotation job itself would be a better approach. Those Tools and Controller could even persist the annotations using some global system registry to get cross-Tool/Controller interop.
Um, that sounds like a completely different approach!
Obviously, resources already have unique identifiers that Controllers can associate with whatever external "annotations" they like.
The group previously discussed and rejected implementing an open API for this on the Registry to PATCH "overlays" into what's returned from the Query API. It has benefits - primarily, no waiting for support from device vendors - and some challenges to be worked out - e.g. version
and lifecycle/persistence management.
That discussion may be worth reopening, but what's wanted here are minimum specs for devices if we go ahead with the proposed Annotation API spec.
Sorry to be off-topic, I was not aware that this was discussed.
Here is my take on the bullet point list from the original post:
Thanks, Tim.
- a minimum number of values per user tag that MUST be supported 10
I'm intrigued by this one, what is the use case for so many values for each tag?
Sorry to be off-topic, I was not aware that this was discussed.
@alabou, personally I'm still interested in exploring this approach and seeing whether there are simple but effective lifecycle and update semantics.
have we defined reset behaviour ?
From Behaviour - Resetting Values:
- For labels and descriptions, the implementation MUST either restore an initial or configured default value, or set the value to the empty string.
- For a named tag, the implementation MUST either restore an initial or configured default array of values, or remove the named tag from the resource.
Thanks @garethsb I was sure we had discussed resetting, but it was late last night. I did mull over these minimum values, I felt that 24 chars for valeus might be too restrictive in some circumstances, hence the 32 - everywheer for consistency. For tag values, I started with a quantity of 3, but I thought such a low number would come back and bite us. Happy to discuss further -am I aiming too high, given the restricted resources of much of the equipment involved?
32 characters for description, label, and 1+10 (name+values) x 4 tags = 32 x 46 characters per annotatable resource, minimum. If characters are stored in UTF-8, they take between 1 and 4 bytes. That makes the worst case minimum we'd be asking for 32 x 46 x 4 bytes per annotatable resource or approx 5.9kB. (I don't really want to put limits on supported codepoints, even though if restricted to first 128 codepoints (basically ASCII) which can be UTF-8 encoded in one byte, this would mean 1.5kB... I guess we could at least point that out to clients though?)
Restricting to 3 values per tag, takes the calculation from 5.9kB to 32 x 18 x 4B = 2.3kB. Just playing with numbers, if we went for minimum of 10 tags with minimum of 1 value each, that's 32 x 22 x 4B or 2.8kB. 3 tags with 1 value, that's 32 x 8 x 4B = 1kB.
Of course, these figures are all per resource. We haven't discussed a minimum limit on total annotations per device.
Need some low-memory device manufacturers to weigh in...
08/06/2023 The general conclusion from the call was :
Suggestion for the limits as follows (not sure that we were in total agrement on the SHOULD quantities of tags)
Nodes and Devices MUST support Labels of up to 32 characters
Nodes and Devices SHOULD support Description of up to 32 characters
Nodes and Devices MUST support 1 user tag, each with up to 32 chars for key and 32 chars for value
Nodes and Devices SHOULD support 5 user tags, each with up to 32 chars for key and 32 chars for value
Senders and Recievers MUST support Labels of up to 32 characters
Senders and Recievers SHOULD support Description of up to 32 characters
Senders and Recievers MUST support support 1 user tag, with up to 32 chars for key and 32 chars for value
Senders and Recievers SHOULD support support 5 user tags, each with up to 32 chars for key and 32 chars for value
We're going to have to be clear how a definition of string length in bytes is applied.
E.g.
The JSON strings "a\\b"
and "a\u005Cb"
are both three characters long according to the ABNF for JSON and three bytes long in UTF-8 (a\b
).
The JSON "π"
(U+1F603) is 1 character long and 4 bytes in UTF-8 (0xF0 0x9F 0x98 0x83
). The JSON "\uD83D\uDE03"
is two characters according to the ABNF but they are a surrogate pair that encode π and surrogate pairs are illegal in UTF-8, the correct encoding is the same 4 bytes in UTF-8 (0xF0 0x9F 0x98 0x83
)...
(The two JSON strings "π"
and "\uD83D\uDE03"
are equal.)
We could say in the spec that anything that is not in Latin-1 Supplement block (U+0000 to U+00FF) is assumed as being 4 bytes wide ... The limit of 32 generic chars should be expressed in bytes as 32*4 bytes or 32*4 Latin-1 Supplement characters or 32 non Latin-1 Supplement characters characters.
Update 2023-06-15: Group ok with the minimum to be specified as 64 bytes so up to 64 Latin-1 Supplement block characters (one byte), up to 32 Basic Multilangual Plane characters (two bytes), up to 21 Supplementary Plane characters (three bytes), up to 16 four-byte characters.
TODO: check the correct terminology for four-byte characters such as emojii and ancient Egyptian hierglyphs.
Why the increase to 64 Bytes? To get 21 CJK code points?
The later code points in the Basic Multilingual Plane need up to 3 Bytes per code point. The rest of the planes (U+10000 to U+10FFFF) need the fourth Byte.
See https://en.m.wikipedia.org/wiki/UTF-8#Encoding
The first 128 code points (ASCII) need one byte. The next 1,920 code points need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also IPA extensions, Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N'Ko alphabets, as well as Combining Diacritical Marks. Three bytes are needed for the remaining 61,440 code points of the Basic Multilingual Plane (BMP), including most Chinese, Japanese and Korean characters. Four bytes are needed for the 1,048,576 code points in the other planes of Unicode, which include emoji (pictographic symbols), less common CJK characters, various historic scripts, and mathematical symbols.
Why the increase to 64 Bytes?
There was no increase to 64 bytes but a decrease to 64 bytes ...
The limit was 32 characters ... which implied 128 bytes ... With UTF-8 ASCII (1 byte) 128 characters seems too much while 64 seems better than 32 ... So using a max of 64 bytes allows 64 ASCII characters and a minimum of 16 complex characters.
This seems to be a good compromise ... getting more ASCII characters, not wasting memory and have a reasonable footprint.
We get U+0000-U+007F => 1 byte => 64 characters ASCII U+0080-U+07FF => 2 byte => 32 characters Complete Latin and other U+0800-U+FFFF => 3 bytes => 21 characters Japanese, Chinese, Korean U+10000-U+10FFF => 4 bytes => 16 characters Egyptian Hieroglyphs
OK, I see. My recollection of the previous call was 32 ASCII chars and thus fewer of the 'bigger' chars.
If min of 64B x 3 storage per annotatable resource is OK for constrained devices, great. I'll update #26 accordingly and we can merge.
Do we need 64B for the user tag name (after the "urn:x-nmos:tag:user:" namespace prefix)? The simplicity of the same limit is nice but will it get used? Although the JSON Schema type of the tag name is just string
, and is not required by IS-04 to be a URN, in this case we are discussing a URN so the character set defined for Namespace Specific Strings by RFC 8141 applies and that's an ASCII subset.
My understanding is that the limit of 64B corresponds to the complete tag name, not only what comes after "urn:x-nmos:tag:user:" so for ASCII there remain 44 characters after "urn:x-nmos:tag:user:".
https://github.com/AMWA-TV/is-13/blob/374c934932232e6cdf96a7dae7cb3558b45beaca/docs/Behaviour.md#additional-limitations identifies possible ways an implementation might be limited. Also discussed in https://github.com/AMWA-TV/is-13/issues/23#issuecomment-1572589318.
In Slack, @rbgodwin-nt gave a strawman for some minimum limitations:
@alabou responded:
Can we reach consensus on appropriate minimum limitations, e.g.
urn:x-nmos:tag:user:
namespace) tags per resource (and limits on the names of those tags) that MUST be supported