ascmitc / mhl

ASC Media Hash List
MIT License
58 stars 8 forks source link

No warning for file with missing hash #116

Closed jmccdev closed 6 months ago

jmccdev commented 2 years ago

The data used for this is under /sources/104_verification_types/104h/ within the dataset on the G-Drive: https://drive.google.com/drive/folders/1PrAPczRFBQsVfjakbX-fqnDVHsAD-kd1

I created the ASC MHL History with: ascmhl create -h md5 16421225_Day001/

Afterwards, I Manually deleted this from the 0001* MHL:

    <hash>
      <path size="38154240" lastmodificationdate="2022-01-24T17:17:14-08:00">Camera_Media/A007_0124J4/A007_0124J4.RDM/A007_C001_0124UA.RDC/A007_C001_0124UA_001.R3D</path>
      <md5 action="original" hashdate="2022-02-08T14:54:36.837260-08:00">4a4d102cb08af8ee4a96076ed5e51c43</md5>
    </hash> 

I then ran the create command on the same directory: $ ascmhl create -h md5 16421225_Day001/

There is no error message, and the tool creates a new register for the added file, with a HashFormatType of 'original':

    <hash>
      <path size="38154240" lastmodificationdate="2022-01-24T17:17:14-08:00">Camera_Media/A007_0124J4/A007_0124J4.RDM/A007_C001_0124UA.RDC/A007_C001_0124UA_001.R3D</path>
      <md5 action="original" hashdate="2022-02-14T17:38:57.140219-08:00">4a4d102cb08af8ee4a96076ed5e51c43</md5>
    </hash>
jmccdev commented 2 years ago

Closely related to this... for test case 104i, one file in the data set is renamed after the ASC MHL History creation.

$ ascmhl create -h md5 -n 16421225_Day001/
$ mv 16421225_Day001/Camera_Media/C005R34T/C005C001_220125_R34T.mov 16421225_Day001/Camera_Media/C005R34T/C005C001.mov

I ran the asc mhl create command on this data set. For the renamed file, a 'Missing file' error is printed, but the register for that file is completely removed in the new MHL generation. Should it not be retained with a 'failed' HashFormatType? Additionally, it creates a new register for the renamed file, and gives it a HashFormatType of 'original'.

ptrpfn commented 2 years ago

Afterwards, I Manually deleted this from the 0001* MHL:

    <hash>
      <path size="38154240" lastmodificationdate="2022-01-24T17:17:14-08:00">Camera_Media/A007_0124J4/A007_0124J4.RDM/A007_C001_0124UA.RDC/A007_C001_0124UA_001.R3D</path>
      <md5 action="original" hashdate="2022-02-08T14:54:36.837260-08:00">4a4d102cb08af8ee4a96076ed5e51c43</md5>
    </hash> 

That edit should actually be caught by a check of the hash of the MHL file against the has in the chain file. Currently the hashes of the chain file are not verified. That's a TBD.

jmccdev commented 2 years ago

104m and 105c basically run into the same behavior. Some amount of files are not hashed anywhere (in root directory history, or in a nested history), and they are silently added to the root directory's history with 'original' HashFormatType.

jmccdev commented 2 years ago

Looks like the verify command prints a warning about finding a new file:

$ ascmhl verify 16421225_Day001/
found new file Camera_Media/A007_0124J4/A007_0124J4.RDM/A007_C001_0124UA.RDC/A007_C001_0124UA_001.R3D
Error: New files not referenced in the ASC MHL history have been found
ptrpfn commented 2 years ago

Just for reference: There is already an issue #22 for the verification of chain files.

ptrpfn commented 6 months ago

Renaming files is now supported in v1.0. Also missing / changed ASC MHL manifest files are detected (see https://github.com/ascmitc/mhl/releases/tag/v1.0 for details)