CXwudi / vocadb-video-downloader-new

An integrated cli-based media archiving solution for VocaDB
2 stars 0 forks source link

Use `mka` as the fallback container for any other audio formats (including Enhanced AC-3) #98

Open CXwudi opened 4 days ago

CXwudi commented 4 days ago

When downloading videos from https://vocadb.net/L/15285, we encountered a new audio format that was never seen before

【初音ミク】こころのキ ラリ【shishy】.zip

Here is the MediaInfo:

General
Unique ID                      : 163624331291903806959873331136622681307 (0x7B18E66195F88671BBFEA6E5040ADCDB)
Complete name                  : D:\coding-workspace\Vocaloid Coding POC\Project VD Run Env\2024年V家新曲-downloaded\【初音ミク】こころのキラリ【shishy】[661223]-pv.mkv
Format                         : Matroska
Format version                 : Version 4
File size                      : 16.9 MiB
Duration                       : 3 min 43 s
Overall bit rate               : 636 kb/s
Frame rate                     : 29.970 FPS
Writing application            : Lavf61.1.100
Writing library                : Lavf61.1.100
ErrorDetectionType             : Per level 1

Video
ID                             : 1
Format                         : VP9
Format profile                 : 0
Codec ID                       : V_VP9
Duration                       : 3 min 43 s
Bit rate                       : 240 kb/s
Width                          : 1 920 pixels
Height                         : 1 080 pixels
Display aspect ratio           : 16:9
Frame rate mode                : Constant
Frame rate                     : 29.970 (30000/1001) FPS
Color space                    : YUV
Chroma subsampling             : 4:2:0
Bit depth                      : 8 bits
Bits/(Pixel*Frame)             : 0.004
Stream size                    : 6.37 MiB (38%)
Language                       : English
Default                        : Yes
Forced                         : No
Color range                    : Limited
Color primaries                : BT.709
Transfer characteristics       : BT.709
Matrix coefficients            : BT.709

Audio
ID                             : 2
Format                         : E-AC-3
Format/Info                    : Enhanced AC-3
Commercial name                : Dolby Digital Plus
Codec ID                       : A_EAC3
Duration                       : 3 min 43 s
Bit rate mode                  : Constant
Bit rate                       : 384 kb/s
Channel(s)                     : 6 channels
Channel layout                 : L R C LFE Ls Rs
Sampling rate                  : 48.0 kHz
Frame rate                     : 31.250 FPS (1536 SPF)
Bit depth                      : 32 bits
Compression mode               : Lossy
Stream size                    : 10.2 MiB (60%)
Title                          : ISO Media file produced by Google Inc.
Language                       : English
Service kind                   : Complete Main
Default                        : Yes
Forced                         : No
VENDOR_ID                      : [0][0][0][0]
Dialog Normalization           : -9 dB
compr                          : 0.53 dB
dialnorm_Average               : -9 dB
dialnorm_Minimum               : -9 dB
dialnorm_Maximum               : -9 dB
CXwudi commented 4 days ago

PoC finished, here is the implementation route:

Extraction: ffmpeg -i .\【初音ミク】こころのキラリ【shishy】[661223]-pv.mkv -vn -acodec copy .\【初音ミク】こころのキラリ【shishy 】.mka.

Tagging: mkvpropedit.exe '.\【初音ミク】こころのキラリ【shishy】.mka' --tags all:tag.xml

Where tag-file.xml specification can be found in:

https://www.matroska.org/technical/elements.html (see Tagging section) https://www.matroska.org/technical/tagging.html

Here is a sample XML file from GPT-4o:

<?xml version="1.0" encoding="UTF-8"?>
<Tags>
  <!-- Tag for the whole file -->
  <Tag>
    <Targets>
      <TargetTypeValue>50</TargetTypeValue>
    </Targets>
    <Simple>
      <Name>ENCODER</Name>
      <String>Lavf61.1.100</String>
    </Simple>
    <Simple>
      <Name>CUSTOM TAG</Name>
      <String>Wudi</String>
    </Simple>
  </Tag>

  <!-- Tag for the artist and date recorded -->
  <Tag>
    <Targets>
      <TargetTypeValue>30</TargetTypeValue>
    </Targets>
    <Simple>
      <Name>ARTIST</Name>
      <String>some artist</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>DATE_RECORDED</Name>
      <String>2024</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
    <Simple>
      <Name>CUSTOM TAG 2</Name>
      <String>Wudi 2</String>
    </Simple>
  </Tag>

  <!-- Tag for the title -->
  <Tag>
    <Targets>
      <TargetTypeValue>30</TargetTypeValue>
    </Targets>
    <Simple>
      <Name>TITLE</Name>
      <String>some title</String>
      <TagLanguage>und</TagLanguage>
      <DefaultLanguage>1</DefaultLanguage>
    </Simple>
  </Tag>

</Tags>

To add cover image, an extra command is needed: mkvpropedit.exe '.\【初音ミク】こころのキラリ【shishy】.mka' --attachment-name "cover.webp" --attachment-mime-type "image/webp" --attachment-description "cover image" --add-attachment .\【初音ミク】こころのキラリ【shishy】[661223]-thumbnail.webp

Be aware that we need to detect the mime-type, we can reuse mediainfo we already have

CXwudi commented 4 days ago

Looks like Mka can be a versatile container for any audio format. Hence, we can use mka as the fallback for any other unrecognized format.

CXwudi commented 4 days ago

Matroska is not supported in mutagen https://github.com/quodlibet/mutagen/issues/3, so no need to think about workaround in python

CXwudi commented 4 days ago

Just a sidenote, the eac3 format can use APEv2 format, which is supported by mutagen. However, it doesn't support cover image and the format itself is not widely recognized. Hence discarding