Ymagis / ClairMeta

Clairmeta is a python package for Digital Cinema Package (DCP) probing and checking.
BSD 3-Clause "New" or "Revised" License
84 stars 22 forks source link

Keep Asset Map Path addressing scheme around for validation? #188

Open matmat opened 3 years ago

matmat commented 3 years ago

https://github.com/Ymagis/ClairMeta/blob/024855709dd132d98797914819cfa26b142a94bb/clairmeta/dcp_utils.py#L32

I think this is wrong. At least the first replace.

I have a couple of old DCP:s which have this in their Asset Map:

<am:Path>file:///7d2f48ae-e342-4dc7-aa28-cfb177588a94_pcm.mxf</am:Path>

Using pythons urllib to parse this:

>>> import urllib.parse
>>> urllib.parse.urlparse("file:///6e2c89ad-033f-4298-9a39-1b87ea0730e5_pkl.xml")
ParseResult(scheme='file', netloc='', path='/6e2c89ad-033f-4298-9a39-1b87ea0730e5_pkl.xml', params='', query='', fragment='')

I.e. paths beginning with file:/// are actually absolute paths that should probably fail validation.

I tried to come up with a way to keep the original path around so that it, including the addressing scheme, could be checked in dcp_check_am.py. The problem is that it seems the paths are used a lot already before they are validated..

Could we solve this somehow?

remia commented 3 years ago

Would keeping only the last replace be enough to solve this ?

path = path.replace('file://', '')

That way, absolute path are preserved and should trigger an error as per SMPTE 429-9-2014 A.2:

Each Path element value shall be a relative-path reference as specified in RFC 3986. No query or fragment component shall be present.

Note 2: A relative-path reference cannot start with a "/" character.

Assets path are used before check mainly to perform probing, my guess is that this change would not cause issues but haven't tested that claim. I think check_assets_am_path must be extended to include search for out of DCP root directory assets.

Would that address your concerns ?

remia commented 3 years ago

Also the situation might be more complex than that, looking at https://en.wikipedia.org/wiki/File_URI_scheme :

A valid file URI must therefore begin with either file:/path, file:///path or file://hostname/path. file://path (i.e. two slashes, without a hostname) is never correct, but is often used.

Would that mean file:///myfile is actually a relative path and file:////media/myfile is an absolute path ?

Not sure if you should be enforcing proper URI usage or not here.

matmat commented 3 years ago

The file:// path in my example Asset map is indeed absolute and incorrect. In my opinion it should trigger an error.

Because asset maps, as clairmeta works now, are crucial for finding all other files, maybe the asset map checks should be performed before probing?

As for searching for assets outside of the DCP root. I don't think we should do this. It is not allowed and could be potentially unsafe.

In the bigger picture and in the long run I hope to have an option to probe/check DCP:s while optionally ignoring (and be able to reconstruct) any or all of Assetmap, Volindex and PKL. All these three file types are in my view "only" provided for convenience of transport and can be exactly reconstructed by reading the CPL(s) involved and scanning a filesystem path for assets.

To be able to construct AMs/Volindex/PKLs would be really useful for several reasons:

To be able to ignore and/or reconstruct AM/Volindex/PKLs is feature I would really much want! :)

remia commented 3 years ago

That would be an interesting feature indeed. Is that what Dolby call "consolidate" ? I don't remember exactly all the terms...

My proposal would be to extract this last comment into a new feature request issue, to be discussed further and try to collect test scenarios. I think this would not require a huge amount of work, at least to cover basic use case... There was talks before with @jamiegau about providing some utility script / functionalities and this one might well be in this category.

As for this URI issue, should we just go with path = path.replace('file://', '') then ?

jamiegau commented 3 years ago

@matmat, yes partial DCP and recreation of the files that build a DCP ASSETMAP etc has been an interesting issue. I have been utilising this type of feature for some time as when transfering DCPs around, most of the time I only send the files needed for a specific CPL and not everything (3D essance for example) But yes, not all DCI-Player will agree to ingest such animals.

I have wanted to create a tool on top of this feature to rebuild the other files to make it compatible with all DCI-Players, but its always been a "would be nice" tool. And, I was not sure if it was possible to do so when a DCP is signed and all hash checks are checked against the original packaging signature. If the PKL has is part of the encryption signature, it wouldn't be possible. But I never looked into if this was the case.

From my perspective, and as digital download is becoming more common, combined DCPs are likely to become less useful as sites will download on demand what they need. So I decided to focus on other issues as its a problem that may not last long.

But from a film festival standpoint. I can see the benefit.