When validation char encoding and namespace URI in DCTimedTextDescriptor photon should ignore '0' terminator

Netflix / photon

Photon is a Java implementation of the Interoperable Master Format (IMF) standard. IMF is a SMPTE standard whose core constraints are defined in the specification st2067-2:2013

Apache License 2.0

234 stars 77 forks source link

When validation char encoding and namespace URI in DCTimedTextDescriptor photon should ignore '0' terminator #196

Closed go4shoe closed 7 years ago

go4shoe commented 7 years ago

When setting UCSEncoding in in DCTimedTextDescriptor to 'UTF-8\0' - Photon rejects the subtitle file expecting 'UTF-8' (without terminator). The same is true for the Namespace URI.

Although MXF doesn't require a terminator - zero (and everything there after) should be ignored.

So 'UTF-8\0' should be treated equal to 'UTF-8'.

SMPTE ST 377-1:2011 (page 20) The number of bytes allocated to this string is given by the KLV encoding. There is no requirement to terminate each string with a zero or other special value. However, if the length of the String information is less than the space allocated, the string shall be terminated with a zero value.

svenkatrav commented 7 years ago

I agree that UTF-8 need not be null terminated assuming the KLV length indicates length of 10. But in this case looks like length is set to 14 and Photon is reading the entire string as 'UTF-8\0' which is not a valid value for UCSEncoding. Also the following line clearly indicates that null termination is required if KLV length is more than length of the string: "However, if the length of the String information is less than the space allocated, the string shall be terminated with a zero value"

palemieux commented 7 years ago

Is the following sequence of bytes written in the UCSEncoding field?

005500540046002D0038005C0030

go4shoe commented 7 years ago

svenkatrav,

the way I read SMPTE ST 377-1:2011 (page 20):

'UTF-8' 'UTF-8\0' or 'UTF-8\0_more_space_can_be_reserved_here'

should be all equal strings.

SMPTE ST 377-1:2011 (page 20) made this provision to allow reserving space. This should cover cases like: I don't know the encoding yet - so I reserve space equal to the max string length of all possible candidates. Then I write my samples followed by filling in the metadata overwriting the reserved space and '0' terminating if required. I changed our code and removed the terminator but I am still convinced that a string comparison with a persisted MXF string should either stop at the length described in the KLV or at the '0' terminator what ever comes first. I doesn't make any sense to include the '0' terminator or the potential uninitialized reserved space - in the string comparison..

svenkatrav commented 7 years ago

(Just repeating Pierre's Question here) Is the following sequence of bytes written in the UCSEncoding field?

005500540046002D0038005C0030