Borewit / music-metadata

Stream and file based music metadata parser for node. Supporting a wide range of audio and tag formats.
MIT License
893 stars 90 forks source link

Issue parsing Matroska files using browser Web Streams #2145

Closed Borewit closed 1 month ago

Borewit commented 1 month ago

I am getting errors for .webm and .mkv.

using parseWebStream():

TypeError: Cannot read properties of undefined (reading 'docType')
    at MatroskaParser.parse (MatroskaParser.js:50:68)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3172:17)

using parseBlob():

Error: End-Of-Stream
    at ReadStreamTokenizer.readBuffer (ReadStreamTokenizer.js:44:19)
    at async MatroskaParser.readBuffer (MatroskaParser.js:221:9)
    at async MatroskaParser.parseContainer (MatroskaParser.js:151:39)
    at async MatroskaParser.parseContainer (MatroskaParser.js:139:33)
    at async MatroskaParser.parseContainer (MatroskaParser.js:139:33)
    at async MatroskaParser.parse (MatroskaParser.js:49:26)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3175:17)

Originally posted by @hvianna in https://github.com/Borewit/music-metadata/issues/2135#issuecomment-2226079597

Do you experience the same issues here?: https://audio-tag-analyzer.netlify.app/

Yes, same error. I tried with a few video formats (webm, mkv, mp4)..

image

Fileinfo of one of them:

General
Complete name                            : W:\DIY - Tips & Tricks - Tips in life.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 24.9 MiB
Duration                                 : 4 min 11 s
Overall bit rate                         : 828 kb/s
Frame rate                               : 30.000 FPS
Writing application                      : Lavf58.29.100

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L3.1
Format settings                          : CABAC / 5 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 5 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 4 min 11 s
Bit rate                                 : 692 kb/s
Width                                    : 576 pixels
Height                                   : 1 024 pixels
Display aspect ratio                     : 0.562
Frame rate mode                          : Constant
Frame rate                               : 30.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.039
Stream size                              : 20.8 MiB (84%)
Title                                    : Twitter-vork muxer
Writing library                          : x264 core 164 r3095 baee400
Encoding settings                        : cabac=1 / ref=5 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=2 / psy=0 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=0 / threads=4 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / stitchable=1 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=infinite / keyint_min=30 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=28.0 / qcomp=0.60 / qpmin=10 / qpmax=69 / qpstep=4 / vbv_maxrate=2048 / vbv_bufsize=2048 / crf_max=0.0 / nal_hrd=none / filler=0 / ip_ratio=1.40 / aq=2:1.00
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 4 min 11 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 3.84 MiB (15%)
Title                                    : Twitter-vork muxer
Default                                  : Yes
Alternate group                          : 1
Borewit commented 1 month ago

I think it caused by https://github.com/Borewit/peek-readable/issues/724

Borewit commented 1 month ago

@hvianna can you break v9.0.2 or do I owe you a beer? :wink:

hvianna commented 1 month ago

parseBlob() works now, but I still get the same error from parseWebStream() for mkv files (mp4 works fine!)

TypeError: Cannot read properties of undefined (reading 'docType')
    at MatroskaParser.parse (MatroskaParser.js:50:68)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3166:16)

Test video

Borewit commented 1 month ago

Thanks for testing @hvianna .

The only way I have tested web streams is via Blob.toStream().

How do you turn that file into a Web Stream and get to the error?

hvianna commented 1 month ago

@Borewit

I'm using JS-native fetch(), which offers a ReadableStream via response.body.

You can use the simple HTML file below to test it. No need for webpack, just serve it via a local webserver for fetch to work.

<!DOCTYPE html>
<html>
<body>
<script type="module">
  import { parseWebStream } from 'https://cdn.skypack.dev/music-metadata?min';

  fetch('test.mkv')
    .then( response => parseWebStream( response.body, response.headers.get('content-type'), { skipPostHeaders: true } ) )
    .then( metadata => console.log( metadata ) );
</script>
</body>
</html>

This is working fine for mp4 and flac files, but throws that error for mkv.

Borewit commented 1 month ago

Workaround for the issue you experience is, to parse the content-length to the parser.

import { parseWebStream } from 'music-metadata';

async function parseMetadata(url) {
  console.info('Fetching ' + url);
  const response = await fetch(url);
  const contentType = response.headers.get('Content-Type');
  const contentLength = response.headers.get('Content-Length');
  console.info('Content-type = ' + contentType);
  const metadata = await parseWebStream(response.body, {mimeType: contentType, size: contentLength ? parseInt(contentLength) : undefined});
  console.info('Got metadata');
  console.info(metadata);
}

parseMetadata('../sample/5.1 Surround Test (AAC).mkv').catch(error => {
  console.error(error);
});
Borewit commented 1 month ago

Should be fixed in v9.0.3