Borewit / music-metadata

Stream and file based music metadata parser for node. Supporting a wide range of audio and tag formats.
MIT License
898 stars 90 forks source link

Provide an option to pick which parsers to load. #894

Closed minht11 closed 1 year ago

minht11 commented 3 years ago

Feature Request Allow users to only load parsers they want. This library supports quite a few music file types, but some users might not need them all, especially if bundle size is an issue. I saw that there was an attempt to lazy load parsers few years ago, but that caused a lot problems for some people, so instead I propose to leave the default behavior the same but do something like this:

// core-without-any-parsers.ts
export function parseStream(stream, plugins: IParser[]) {
  return parseSomehow()
}

// core.ts
import * as emptyParsers from "core-without-any-parsers"

export function parseStream(stream) {
  return emptyParsers.parseStream(stream, ALL_DEFAULT_PARSERS);
}

Create new file exporting the same parse functions but they also take extra parsers argument. Users then can load whichever parsers they want, however they want even lazy. Nothing changes for current users, core.ts imports core-without-any-parsers.ts and just provides all of the parsers.

Doing this would require reworking ParserFactory.ts and making a generic Parser interface which should probably contain a supported type/extensions and an actual parser implementation:

export const AiffParser: IParser = {
  kind: "aiff"
  impl: AiffParserImpl
}

Then user would consume new api like this:

import AiffParser from "music-metadata/parsers/aiff"
import { parseStream } from "music-metadata/core-without-any-parsers"
parseStream(stream, [AiffParser])

I understand this would be quite a big change, so I don't expect it to happen any time soon, but it would give a lot of flexibility for people using the library.

Borewit commented 3 years ago

Note that reducing the bundle size is not directly the same thing as reducing the total bundle size.

I did a bit of counting, to get an impression how much is shared code versus parsers:

Code purpose line count
common (shared) 1576
tag parsers 1303
container parsers 7711

1576 / (1303 + 1303 + 7711) ≈ 73% of the code are "container" parsers. So the other part, ~26% are the core-without-any-parsers.

I remember the parser being smaller then the core, maybe the code increased over time, maybe line count gives a wrong representation. Anyway, it gives a rough idea, what can be "saved".

How do you prevent all parsers included, without putting each parser in it's own module?

Also need to keep in mind that inter parser dependencies exist.

Also note, if this is going to implemented, it will be done on top of / be complaint with ECMA Scrript Module (#885).

minht11 commented 3 years ago

How do you prevent all parsers included, without putting each parser in it's own module?

The music-metadata package that is published to NPM is not bundled, it's just separate javascript files, that leaves ability for user bundler to treeshake things down. If you import just core-without-parsers.js then only things that are imported by that file would be included to final user bundle. Right now that's not possible because core.js imports everything.

Borewit commented 3 years ago

What I forgot to include in my size equation, are the dependencies. Maybe these can be shaken out as well.

Anyway, I think this can be done in principal. Not sure how much this idea conflicts with lazy loading. That kind of make sense as well right, first load the core so quickly getting ready to parse and lay load the required parser.

minht11 commented 3 years ago

Lazy loading would be done by the user if needed, since user generally knows what files they want to parse, before hand. In my personal use case I can't use lazy loading, because I run music-metadata inside the worker.

Automatic lazy loading could be as done as a separate option. Something akin to react lazy component.

// aiff-parser-lazy.ts
export const AiffParserLazy: IParser = {
  loading: "lazy",
  kind: "aiff",
  impl: () => import('./parsers/aiff-impl'),
}
import ALL_PARSERS from './all-parsers.ts'
import ALL_PARSERS_LAZY from './all-parsers-lazy.ts'

// All parsers available, imported sync.
parseStream(stream, ALL_PARSERS)
// All parsers available, automatically imports only needed parsers.
parseStream(stream, ALL_PARSERS_LAZY)

That might be overkill tbh, but maybe some people would benefit from an automatic detection.

Borewit commented 1 year ago

I am not planning to work on this enhancement

minht11 commented 8 months ago

I am still very interested in this. Size issue is still very concerning. Would you accept a PR which added support for it in non breaking way? I haven't done that much research into it, but if each parse function accepted additional option, lets focus on parseBuffer for simplicty:

import { parseBuffer as parseMetadata } from 'music-metadata/base' // Namescape name TBD.
import { MP4Parser } from 'music-metadata/parsers'

const tags = await parseBuffer(
    fileUint8,
    { mimeType: file.type, path: file.name, size: file.size },
    {
        parsers: [
          MP4Parser,
        ]
    })

In code internally there would be 3 relatively simple changes.

  1. Add and expose parsers module which would be barrel file reexporting all parsers classes

    // parsers.ts
    export { AIFFParser } from './aiff/AiffParser.js';
    export { APEv2Parser } from './apev2/APEv2Parser.js';
    export { AsfParser } from './asf/AsfParser.js';
    export { FlacParser } from './flac/FlacParser.js';
    export { MP4Parser } from './mp4/MP4Parser.js';
    export { MpegParser } from './mpeg/MpegParser.js';
    export MusepackParser from './musepack/index.js';
    export { OggParser } from './ogg/OggParser.js';
    export { WaveParser } from './wav/WaveParser.js';
    export { WavPackParser } from './wavpack/WavPackParser.js';
    export { DsfParser } from './dsf/DsfParser.js';
    export { DsdiffParser } from './dsdiff/DsdiffParser.js';
    export { MatroskaParser } from './matroska/MatroskaParser.js';
  2. Add parsers option. Which would allow passing parser classes. Use that instead one big ParserFactory.loadParser function https://github.com/Borewit/music-metadata/blob/ca9870493a1a286b22a5a049a0c97fdfb281277f/lib/ParserFactory.ts#L191-L211

  3. By default current index.ts and core.ts would still load all parsers.

  4. Add new music-metadata/base entry which would be the same as current core.ts index.ts files, but does not define parsers by default.

This solution would not be perfect for treeshaking, because there are more places where parsers are imported, but should be huge step forward in reducing final bundle size if just a few parsers are provided.

If you are not against I could start working toward an actuall implementation.