NYPL / ami-tools

MIT License
14 stars 6 forks source link

validate_ami_bags dateCreated json test #44

Open bturkus opened 3 years ago

bturkus commented 3 years ago

I think there might be something a little off in way that validate_ami_bags is checking for alignment of the dateCreated in Mediainfo vs. as reported in the JSON. I'm getting a lot of this:

ami_md.ami_json: 2021-02-03 13:25:08,102 - WARNING - dateCreated in JSON and from file disagree. JSON: 2021-01-12, From file: 2021-02-03.

But in running the file thru Mediainfo, I'm not seeing the "from file" (in this case 2021-02-03) anywhere:

MY-LPAMI-056430:~ benjaminturkus$ mediainfo -F /Volumes/lpasync/\!-PAMI/_QCFail/2020_017_pami_178_mao822/323942/data/EditMasters/mao_323942_v01f01_em.flac 
General
Count                                    : 331
Count of stream of this kind             : 1
Kind of stream                           : General
Kind of stream                           : General
Stream identifier                        : 0
Count of audio streams                   : 1
Audio_Format_List                        : FLAC
Audio_Format_WithHint_List               : FLAC
Audio codecs                             : FLAC
Complete name                            : /Volumes/lpasync/!-PAMI/_QCFail/2020_017_pami_178_mao822/323942/data/EditMasters/mao_323942_v01f01_em.flac
Folder name                              : /Volumes/lpasync/!-PAMI/_QCFail/2020_017_pami_178_mao822/323942/data/EditMasters
File name extension                      : mao_323942_v01f01_em.flac
File name                                : mao_323942_v01f01_em
File extension                           : flac
Format                                   : FLAC
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Format/Url                               : https://xiph.org/flac/
Format/Extensions usually used           : fla flac
Commercial name                          : FLAC
Internet media type                      : audio/x-flac
File size                                : 275218861
File size                                : 262 MiB
File size                                : 262 MiB
File size                                : 262 MiB
File size                                : 262 MiB
File size                                : 262.5 MiB
Duration                                 : 1542803
Duration                                 : 25 min 42 s
Duration                                 : 25 min 42 s 803 ms
Duration                                 : 25 min 42 s
Duration                                 : 00:25:42.803
Duration                                 : 00:25:42.803
Overall bit rate mode                    : VBR
Overall bit rate mode                    : Variable
Overall bit rate                         : 1427111
Overall bit rate                         : 1 427 kb/s
Stream size                              : 0
Stream size                              : 0.00 Byte (0%)
Stream size                              :  Byte0
Stream size                              : 0.0 Byte
Stream size                              : 0.00 Byte
Stream size                              : 0.000 Byte
Stream size                              : 0.00 Byte (0%)
Proportion of this stream                : 0.00000
File last modification date              : UTC 2021-01-12 23:25:16
File last modification date (local)      : 2021-01-12 18:25:16

Audio
Count                                    : 280
Count of stream of this kind             : 1
Kind of stream                           : Audio
Kind of stream                           : Audio
Stream identifier                        : 0
Format                                   : FLAC
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Format/Url                               : https://xiph.org/flac/
Commercial name                          : FLAC
Internet media type                      : audio/x-flac
Duration                                 : 1542803
Duration                                 : 25 min 42 s
Duration                                 : 25 min 42 s 803 ms
Duration                                 : 25 min 42 s
Duration                                 : 00:25:42.803
Duration                                 : 00:25:42.803
Bit rate mode                            : VBR
Bit rate mode                            : Variable
Bit rate                                 : 1426752
Bit rate                                 : 1 427 kb/s
Channel(s)                               : 1
Channel(s)                               : 1 channel
Channel positions                        : Front: C
Channel positions                        : 1/0/0
Channel layout                           : C
Sampling rate                            : 96000
Sampling rate                            : 96.0 kHz
Samples count                            : 148109088
Bit depth                                : 24
Bit depth                                : 24 bits
Compression mode                         : Lossless
Compression mode                         : Lossless
Stream size                              : 275149593
Stream size                              : 262 MiB (100%)
Stream size                              : 262 MiB
Stream size                              : 262 MiB
Stream size                              : 262 MiB
Stream size                              : 262.4 MiB
Stream size                              : 262 MiB (100%)
Proportion of this stream                : 0.99975
Writing library                          : reference libFLAC 1.3.3 20190804
Writing library                          : libFLAC 1.3.3 (UTC 2019-08-04)
Encoded_Library_Name                     : libFLAC
Encoded_Library_Version                  : 1.3.3
Encoded_Library_Date                     : UTC 2019-08-04

MY-LPAMI-056430:~ benjaminturkus$ 
nkrabben commented 3 years ago

This is date is running from the following code.

os.path.getctime(self.filepath)

https://docs.python.org/3/library/os.path.html#os.path.getctime

I'm assuming mediainfo might be pulling metadata from the file header instead of the file system. File creation times can be extremely messy, so it would take some additional work to make sure I pull a consistent correct value.

I list this as a warning because it's something that can be useful to know, but isn't critical and shouldn't stop further processing.

Is it OK to close or do you want more investigation on this?