BAM-PFA / pymm

A set of scripts for audiovisual digital preservation tasks
BSD 2-Clause "Simplified" License
5 stars 2 forks source link

mediainfo pbcore2 output on dpx folder is wonky #20

Closed mcampos-quinn closed 6 years ago

mcampos-quinn commented 6 years ago

So when you run mediainfo --Output=PBCore2 /path/2/dpx/dir you get a neat output that is basically a summary of what's in the folder.

BUT you don't get valid XML back. you get two half-valid XML bits. The first is general summary stuff and the second is more detailed information taken from a (randomly selected?) file within the directory. This means that my parsing of the XML PBCore output from mediainfo is only able to recognize the first half.

Since the stuff we care about is mostly in the second bit (details like pixel dimensions, and some other stuff) I made a mod to look for the first file in the directory and use that as the basis to represent the entire sequence folder.

I want to revisit this later and look more closely at what this means.

mcampos-quinn commented 6 years ago

EDIT:

In the middle of writing the issue (on my Mac) based on the problem discovered on the server (Ubuntu) I looked at other files I had made on this computer and realized that they were fine (which is what I had thought since I did my initial tests here).

So this looks like a problem with mediainfo v. 18.05 which we have on our server versus 18.03 which I have on my Mac. The output is different so maybe there's some missing documentation or a bug that was introduced on the newer version. Will follow up there.

(see see a066336 )

mcampos-quinn commented 6 years ago

Running the same on another Mac with 18.05 and correct output makes me suspect my Ubuntu build/apt repository/something else.

Filed a bug report w mediaarea last week anyway.

kieranjol commented 6 years ago

Pbcore2 output is indeed different between 18.03 and 18.05,to the point where it breaks some of our scripts. I parse the PBCore XML and generate an importable CSV from that,but I should have done this with the -f --Language=raw --Output=OLDXML as it's a bit more stable. I haven't been using it with dpx but there could have been some unexpected changes after the update.

mcampos-quinn commented 6 years ago

:/ I am taking the instantiation/essence track chunk out of the mediainfo output and adding each instantiation to an XML record for the work (or the manifestation in FRBR/FIAF talk) as a whole. So for the most part, changes (like changing which tag a bit of information is contained in) are not a huge deal, but for whatever reason on Ubuntu the XML output is not actually valid XML, which breaks my script. Or actually it just leaves out everything past the invalid elements. I found a workaround but I feel less secure with it.

Run on an image sequence folder the mediainfo output is kind of cool! It peeks inside and gives you a snapshot of the sequence including start and end frame numbers and all the embedded metadata, color information, etc.

Hopefully Jerome and co. will have some kind of answer?

JeromeMartinez commented 6 years ago

I think there are 2 different issues there:

mcampos-quinn commented 6 years ago

@JeromeMartinez thank you so much for that information! Yes I found that the sorting on Ubuntu for DPX sequences (in another context of this project) was unreliable. I think that even though it isn't 'supposed' to work PBCore2 output on a DPX directory produces an result that is actually handy. I can close this issue, but we now know that if it's something we need to pursue we will try to seek paid support.

JeromeMartinez commented 6 years ago

Misunderstanding: for the first point (sorting), it is on the way, check snapshots tomorrow. The second point may be there again if you e.g. have a WAV file in the directory, which will not be caught with the DPX for being a single package so 2 PBCore2 outputs in 1 shot instead of 1 PBCore output with video and audio together, I think it could be relevant to sponsor such kind of development in order to have a PBCore2 output corresponding to the expected view from the user i.e. video and audio in 1 package.