MPEGGroup / FileFormatConformance

MPEG File Format Conformance Framework
https://mpeggroup.github.io/FileFormatConformance/
BSD 3-Clause Clear License
10 stars 3 forks source link

RegEx to validate 4CC too strict/possible characters missing #90

Closed stschr closed 10 months ago

stschr commented 10 months ago

Issue

Identified by reviewing the "check" for https://github.com/MPEGGroup/FileFormatConformance/pull/83: The "test_validate_files"-check fails because a 4CC contains a - which is not an allowed character per the RegEx ^file(?>\\.[0-9a-zA-Z]{4})*$ see https://github.com/MPEGGroup/FileFormatConformance/actions/runs/6117640706/job/16604576692 , "Validate Output", Line33

proposed solution

Quickly scrolling through mp4ra.org, the following characters are used with registered 4CCs and should be added into the RegEx:

DenizUgur commented 10 months ago

@podborski can you verify this as well?

podborski commented 10 months ago

From ISOBMFF:

To permit ease of identification, the 32-bit compact type can be expressed as four characters from the range 0020 to 007E, inclusive, of ISO/IEC 10646 (technically identical to the Unicode standard‎[28]) or ISO/IEC 8859-1‎[34]. Each character is hence expressible in a single byte. The four individual byte values of the field are placed in order in the file. Other fields may also use this 32-bit representation, referred to as a ‘four-character code’ (4CC). The maintenance of four-character codes used in the format is defined in ‎Annex D.

Regex should check for four characters and for the range [0x0020, 0x007E]

stschr commented 10 months ago

Thank you, @DenizUgur and @podborski !