MediaArea / MediaConch_SourceCode

Media conformance checker
https://MediaArea.net/MediaConch
BSD 2-Clause "Simplified" License
27 stars 17 forks source link

"Input is not proper UTF-8" message #630

Open Lawrence58 opened 6 years ago

Lawrence58 commented 6 years ago

Hello, I analyzed a video file against the Memoriav Video Files Recommendations policy. According to Media Info the file I tested is YUV, 10 bit, 4:2:2. According to my understanding this should have passed the "Format is uncompressed 4:2:2 10 bit (and)" sub-policy but in the results I get a big red X with the following explanation.

Validation generated an internal error: Entity: line 121: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA9 0x54 0x49 0x4D ame">FIEL</d></b><d o="7786" n="Data">(18 bytes)</d></b><b o="7804" n="Text" i="

What does this mean? Complete Media Info output follows.
Thank you.

Full Media Info report Complete name : V:\Video\Pres Masters and Proxies\1271_PRM.mov Format : MPEG-4 Format profile : QuickTime Codec ID : qt 2005.03 (qt ) File size : 1.23 GiB Duration : 46 s 880 ms Overall bit rate : 225 Mb/s Encoded date : UTC 2017-04-22 20:19:55 Tagged date : UTC 2017-04-22 20:19:57 Writing library : Apple QuickTime ©TIM : 00;00;00;00 ©TSC : 30000 ©TSZ : 1001

Video ID : 1 Format : YUV Codec ID : v210 Codec ID/Hint : AJA Video Systems Xena Duration : 46 s 880 ms Bit rate mode : Constant Bit rate : 224 Mb/s Width : 720 pixels Height : 486 pixels Display aspect ratio : 4:3 Frame rate mode : Constant Frame rate : 29.970 (30000/1001) FPS Standard : NTSC Color space : YUV Chroma subsampling : 4:2:2 Bit depth : 10 bits Scan type : Interlaced Scan type, store method : Interleaved fields Scan order : Bottom Field First Compression mode : Lossless Bits/(Pixel*Frame) : 21.333 Stream size : 1.22 GiB (99%) Language : English Encoded date : UTC 2017-04-22 20:19:55 Tagged date : UTC 2017-04-22 20:19:55 Color primaries : BT.601 NTSC Transfer characteristics : BT.709 Matrix coefficients : BT.601

Audio ID : 2 Format : PCM Format settings, Endianness : Little Format settings, Sign : Signed Codec ID : sowt Duration : 46 s 880 ms Bit rate mode : Constant Bit rate : 1 536 kb/s Channel(s) : 2 channels Channel positions : Front: L R Sampling rate : 48.0 kHz Bit depth : 16 bits Stream size : 8.58 MiB (1%) Language : English Encoded date : UTC 2017-04-22 20:19:55 Tagged date : UTC 2017-04-22 20:19:55

Other ID : 3 Type : Time code Format : QuickTime TC Duration : 46 s 880 ms Time code of first frame : 00:00:00;00 Time code, striped : Yes Language : English Encoded date : UTC 2017-04-22 20:19:55 Tagged date : UTC 2017-04-22 20:19:55

JeromeMartinez commented 6 years ago

The message shows that the error is not due to your file, but is due to a problem in MediaConch during parsing of the file. We got a similar issue with some files but in old versions of MediaConch. Which version of MediaConch do you use? (menu "Help", then "About"; current version is 17.11). Please click also (when you have version 17.11) on "force analyze" in the "status" column in order to clear the cache in case the cash contains an old parsing.

Lawrence58 commented 6 years ago

Sorry. Yes. I am running 17.11 GUI (on a PC). Just installed this morning. The only Status column I see is in the Results but I do not see a force analyze option associated with it. I have tried closing and re-launching the application and reanalyzing. I got the same results.

Lawrence58 commented 6 years ago

Okay. I see the force analyze option. I used it and this time I get a failure X for both implementation (not valid) and policy. In case it matters, the file being analyzed is on a network share.

Lawrence58 commented 6 years ago

I uninstalled Media Conch using unist.exe then reinstalled it. When I launched the application the GUI had already loaded the file that I analyzed in the previous installation with the same failed results. Where is the cache that made that possible?

JeromeMartinez commented 6 years ago

Cache is in C:\Users\%USERNAME%\AppData\Local\MediaConch. I'll try to reproduce the issue with files I have, else I'll ask for your file.

Lawrence58 commented 6 years ago

The cache isn't the issue nor do I think the file I am testing is. I have done the following 5 times with the same results so either it is the application, the implementation, or me. Media Conch v17.11 Windows 7 Pro v6.1

JeromeMartinez commented 6 years ago

The cache isn't the issue nor do I think the file I am testing is.

This is what I said yesterday. Please on hold, as said I'll try to reproduce the issue with files I have, else I'll ask for your file.

Lawrence58 commented 6 years ago

Thank you, Jerome. I'm just trying to provide you with as much info as possible.

JeromeMartinez commented 6 years ago

Would be actually faster for me (no need to find other files) if you can provide the file. Is it possible for you to provide the file? (privately if the file can not be publicly shared)

Lawrence58 commented 6 years ago

I can share the file privately. Do you have a way I can send it to you? It is 1.2 GB.

JeromeMartinez commented 6 years ago

please drop an email at info@mediaarea.net

dericed commented 6 years ago

mediainfo is placing a copyright symbol directly into an attribute which seems to break parsing in some (maybe not all) environments. The copyright symbol could be escaped such as info1="&#xA9;TIM"

Lawrence58 commented 6 years ago

Jerome - This is to confirm that I sent an email to info@mediaarea.net.

JeromeMartinez commented 6 years ago

@Lawrence58 I answered with credentials when I get the email, did you receive the answer? Sorry for the delay, some personal issues preventing me to check the issue, but still on my priority list.

JeromeMartinez commented 6 years ago

I apologize for the delay. I confirm I can reproduce the issue and that it is a MediaConch bug. Working on it.

JeromeMartinez commented 6 years ago

Not seen before because we map most of such metadata to MediaInfo fields, not the case with some of them (the ones for time code in metadata, supporting them is on my ToDo-list) and we were badly storing them with wrong character encoding. Fixed, Windows Snapshot, Mac Snapshot.

You need to reprocess files ("force analyze" button).

bturkus commented 6 years ago

Thanks Jerome! You're the best!

dericed commented 5 years ago

I suggest to reopen as this error persists:

ffmpeg -f lavfi -i testsrc -vframes 1 -y test.mov
mediaconch -mt --Force test.mov | xml fo

provides

-:414.45: Input is not proper UTF-8, indicate encoding !
Bytes: 0xA9 0x73 0x77 0x72: Bytes: 0xA9 0x73 0x77 0x72

            <data offset="3492" name="Name">?swr</data>
                                            ^
JeromeMartinez commented 5 years ago

@g-maxime I tested the GUI and CLI, GUI looks fine but CLI looks to have an issue.