Zeugma440 / atldotnet

Fully managed, portable and easy-to-use C# library to read and edit audio data and metadata (tags) from various audio formats, playlists and CUE sheets
MIT License
440 stars 60 forks source link

Parse Metadata from partial `m4a`/`m4b` #237

Closed acupofjose closed 7 months ago

acupofjose commented 7 months ago

The problem

I am wanting to parse metadata from remote m4b file without having to download the entire file. From my research it looks like the file identification is at the beginning of the file, while the embedded chapters are either at the beginning or end of the file.

My solution is to use a loop that takes a progressively larger window size of byte ranges from the beginning and end of the file and attempt to have ATL parse it. But as of now, it is failing unless I download the entire file.

Is there a way to parse metadata from a partial file? I'm assuming a lot of ignorance on my part here.

Environment

Code To Reproduce Issue [ Good To Have ]

The basic approach:

var headers = await _provider.GetHeaders(part.Key);
var contentLength = headers.ContentLength;
int iteration = 0;
int maxAttempts = 10;
int windowSize = 500000;
bool parsed = false;
var path = Path.Combine(cacheDir, $"{part.Id}.{part.Container}");

while (!parsed && iteration++ < maxAttempts)
{
    Debug.WriteLine($"Attempting Metadata parse of {track.RatingKey} [Attempt: {iteration}]");

    var cacheStream = new FileStream(path, FileMode.Create, FileAccess.Write);

    var front = await _provider.GetMedia(part.Key, 0, windowSize * iteration);
    await front.CopyToAsync(cacheStream);

    if ((windowSize * iteration) < contentLength)
    {
        var back = await _provider.GetMedia(part.Key, contentLength.Value - (windowSize * iteration), contentLength);
        await back.CopyToAsync(cacheStream);
    }

    await cacheStream.DisposeAsync();

    var atl = new Track(path); // <-- unless full file is downloaded, `atl` will not produce any output.
}

Output

Attempting Metadata parse of 2496 [Attempt: 1]
[ATL:1]: moov atom could not be found; aborting read
[ATL:2]: Unrecognized file extension : C:\Users\...\metadata\2965.mp4
[ATL:8]: Instancing a Dummy Audio Data Reader for C:\Users\...\metadata\2965.mp4
[ATL:1]: moov atom could not be found; aborting read
Zeugma440 commented 7 months ago

Hello Joseph,

First of all, thanks for your question~

Contrary to other tagging formats where the sequence of information is specified, the MP4 container is very versatile and allows for an important freedom of implementation.

As a consequence, in an MP4 container, the atoms containing audio metadata are not necessarily at the beginning of the file; they may also be located at the end of the file.

This article has an interesting take on that issue : https://sanjeev-pandey.medium.com/understanding-the-mpeg-4-moov-atom-pseudo-streaming-in-mp4-93935e1b9e9a

Last but not least, when your audio file has chapters, their metadata (e.g. title and picture) might be located all over the file, and not necessarily at one single location (see https://github.com/Zeugma440/atldotnet/wiki/Focus-on-Chapter-metadata#quicktime-qt-chapters)

=> What you're trying to do might work with some MP4/M4A files, but might fail with others, depending on how their internal structure is organized.

acupofjose commented 7 months ago

@Zeugma440 that's helpful, thank you! I'll keep investigating. Many thanks for this library!