Yahweasel / libav.js

This is a compilation of the libraries associated with handling audio and video in ffmpeg—libavformat, libavcodec, libavfilter, libavutil, libswresample, and libswscale—for emscripten, and thus the web.
288 stars 18 forks source link

Add enhanced support for bundlers and also fix loading libav inside workers with runners in other workers #23

Closed martenrichter closed 1 year ago

martenrichter commented 1 year ago

After having set up a working build environment, I have now implemented a mechanism to better support bundlers (Issue https://github.com/Yahweasel/libav.js/issues/20). It mainly uses an optional callback on the LibAV object for loaded generating the URL and importing Modules, so that bundlers can be invoked. It also identifies if a bundler converts the file to a module.

Furthermore, a mechanism was added to supply the url of the wasm files to the worker adding a message type for the initial loading of the wasm. It uses more or less the same fetch mechanism as emscripten's injected code does.

One thing beside others, I am not sure about is the test for noworker in the worker case. Since the worker has another scope at least in the bundler converted module case, LibAV was never detectable and thus noworker was never accessible, so I used the presence of LibAV instead for detecting being inside a worker started by libAV (Note code can also be called by a worker loading libAV, and not be a worker loaded by libAV.)

Please look carefully at the code, it was relatively subtle.

Yahweasel commented 1 year ago

This project does not use any post-ES5 JavaScript, and the bits like extern-post.js are not transpiled, because they're fragments added to other code. The only non-polyfillable post-ES5 feature it strictly needs is typed arrays. Please rewrite your code not to use post-ES5 syntax.

martenrichter commented 1 year ago

Ok, but it may be, that I missed something. I just learned a lot of js recently (I am more a C++ developer), so I learned the new stuff, so I may miss something. No commits may follow which are not tested, please merge only after I say I have tested it. But feel free to comment, on the changes regarding ES5 etc.

martenrichter commented 1 year ago

This should work. I hope I got everything. What is about the URL object should this be polyfilled?

Yahweasel commented 1 year ago

My baseline is that syntax is 100% ES5, and features are of Promise vintage and polyfillable by others if needed. URL is the same age as Promise, so anyone who needed to use this on something without Promises would be expecting to polyfill anyway, and anybody else will have both. So, no problem.

Yahweasel commented 1 year ago

Are you actually using LibAVFactoryAsync, or is that just left over from some previous work? I don't see it being used, but presumably it's being used by something else? (some bundler component?)

Yahweasel commented 1 year ago

What's the purpose of passing target(opts) to getFiles? The target function is already exposed on LibAV.

Yahweasel commented 1 year ago

I severely doubt that the "this !== undefined" check is doing what you think it's doing. The "this" at that point in the code is not going to be the same "this" as when running globally.

martenrichter commented 1 year ago

Merge with change in master. I need to test if this still works, not done yet.

Yahweasel commented 1 year ago

It looks like Emscripten's standard for this is "Module.locateFile". If you can adapt your getFile's code to behave like locateFile, we could make it fit better with what's already there (though how valuable that is is debatable)

martenrichter commented 1 year ago

But would this also be called from a worker? If I understand it correctly this is a callback, which will be inside the worker, but then I do not have access to a function provided by the libAV calling process. Or is there already a mechanism, that passes these callback through the pipe of the worker?

Yahweasel commented 1 year ago

But would this also be called from a worker? If I understand it correctly this is a callback, which will be inside the worker, but then I do not have access to a function provided by the libAV calling process. Or is there already a mechanism, that passes these callback through the pipe of the worker?

There is already a mechanism, but in fact, you're quite right; the function's supposed to be synchronous, and there's no synchronous way of passing such things from worker to worker. So much for that idea ^^'

martenrichter commented 1 year ago

What's the purpose of passing target(opts) to getFiles? The target function is already exposed on LibAV.

Target opts determines which wasm files is loaded, the supplied external functions need to know this. (So in case the getFiles function is replaced for the bundler case.) Ok, I did not spot, that it is already exposed on LibAV, but the LibAV mechanism might not work in every case, if it is turned into a module. But I have to check it.

EDIT, even though target is exposed, I can decide if I pass topts (or now t) or opts, the function needs to know this, if it is out of scope. (Of course this is not true for the default getFiles, but I decided to write it, so that it can work out of scope as example).

martenrichter commented 1 year ago

I severely doubt that the "this !== undefined" check is doing what you think it's doing. The "this" at that point in the code is not going to be the same "this" as when running globally.

I needed to check, if the file is turned into a module. So I took the code from here: https://stackoverflow.com/questions/68631696/how-to-know-whether-a-piece-of-js-is-executed-in-an-es-module-or-a-regular-script first answer. But I think you are right, it has to be checked on global scope, I will fix this.

martenrichter commented 1 year ago

This comment was not correct, there is no universal way to my knowledge.

martenrichter commented 1 year ago

I have verified the changes with vite (building and dynamic loading).

Yahweasel commented 1 year ago

Do not editorialize on my settings. Your "I removed nolibavworker" comment is incorrect. If you load libav in a worker with noworker: true (so that it does not try to create a new worker), that situation arises.

Yahweasel commented 1 year ago

Does getFiles really need to have so many arguments? Couldn't ver, config, and dbg be replaced with «suffix: "@VER-@CONFIG@DBG"»?

martenrichter commented 1 year ago

Do not editorialize on my settings. Your "I removed nolibavworker" comment is incorrect. If you load libav in a worker with noworker: true (so that it does not try to create a new worker), that situation arises.

But I wrote this to get your attention, and that worked. Because I know it is an open problem for me.

Currently, I load libAV (or the polyfill) inside a worker without the noworker set, since I want another worker to be spun for libav. In this case, the code, where I put the comment and commented it out, takes over the worker's message function, which is why I commented it out, but I know it was not the whole story.

So the code in libavin should detect, that it is inside a worker and set a flag, that works like noworker flag to signal the other code, that it is not the worker where it should start working? Or is the code from emscripten just accidentally loaded in this case?

martenrichter commented 1 year ago

Does getFiles really need to have so many arguments? Couldn't ver, config, and dbg be replaced with «suffix: "@VER-@CONFIG@DBG"»?

I will try, it will probably work. Only the extension should be passed separately, the bundler needs a fixed extension.

martenrichter commented 1 year ago

Maybe I write more in detail, about what I have encountered: typeof importScripts !== "undefined" was true (started inside a worker) and typeof LibAV === "undefined" was not true, but !LibAV.nolibavworker was true but the polyfill was started inside a worker.

Yahweasel commented 1 year ago

Do not editorialize on my settings. Your "I removed nolibavworker" comment is incorrect. If you load libav in a worker with noworker: true (so that it does not try to create a new worker), that situation arises.

But I wrote this to get your attention, and that worked. Because I know it is an open problem for me.

Currently, I load libAV (or the polyfill) inside a worker without the noworker set, since I want another worker to be spun for libav. In this case, the code, where I put the comment and commented it out, takes over the worker's message function, which is why I commented it out, but I know it was not the whole story.

So the code in libavin should detect, that it is inside a worker and set a flag, that works like noworker flag to signal the other code, that it is not the worker where it should start working? Or is the code from emscripten just accidentally loaded in this case?

You only need this flag in the very unusual case that you're in a worker, but loading libav.js with noworker. In any normal loading situation, this flag is not needed.

martenrichter commented 1 year ago

That is also what I understand from the docs. But I am loading the libav.js (via the polyfill) in a worker (A) and want to span a worker (B), and the code installs the message handlers in worker (A), (except if I comment it out). This may be a side effect of the bundler turning it into a module.

martenrichter commented 1 year ago

Does getFiles really need to have so many arguments? Couldn't ver, config, and dbg be replaced with «suffix: "@VER-@CONFIG@DBG"»?

I will try, it will probably work. Only the extension should be passed separately, the bundler needs a fixed extension.

I have checked it, it will work, but then the bundler will include all variants shipped with libAV. So it is better to pass them separately so that a user can limit it to one or some configurations. So instead I will slightly reduce the number, otherwise the external function would need to separate the arguments by hand.

martenrichter commented 1 year ago

I got to the bottom of the LibAV.noworker option. Actually, the problem, is why I have commented out the noworker check was triggered if noworker = true in libav.in.js (however I am not sure, the libavwebcodecs polyfill starts 3 instances and I can only guess). However, the noworker property was never attached to the LibAV object, and that causes the problem. I looked at libav.in.js (original and modified) and I saw nowhere code, that actually attached the noworker property to the libAV object. But attaching it will also not solve the problem, since the polyfill starts 3 LibAV Instances and so it will interfere with the others. (Actually. I do not think it will harm, since LibAV is never present in a worker spun by libav.in.js, it is only there in direct mode and thus I think removing the check can still be a solution since the presence of LibAV actually does the same).

Anyway, I may miss something, but can you tell me, how the noworker flag should end up at the LibAV object?

martenrichter commented 1 year ago

Ok, again everything tested with vite, build and dynamic. I added a workaround, that does not the noworker check, but anyway I still believe this mechanism is broken.

Yahweasel commented 1 year ago

noworker is not ever supposed to be part of the LibAV object. It's supposed to be part of the options object passed to LibAV when you load libav.js.

If you are in a worker, and you load libav.js normally, it will load in a worker, no problem. If your bundler has put the .wasm.js file (from Emscripten and post.in.js and extern-post.in.js) into the same environment as the .js file (from libav.in.js), then your bundler screwed up. The .wasm.js code is only supposed to run in the generated worker. LibAVFactory should not exist in the code that loaded the .js file.

However, you are allowed to ask the .js (libav.in.js) code to load without a worker. In that case, if you're loading the .js file from a worker, then the Emscripten code (.wasm.js) will try to set up the worker, because it has no way of knowing whether it was loaded directly in a worker or indirectly through importScripts. Thus, there is a mechanism, nolibavworker, which is totally separate from everything else, that allows that very particular situation to be worked around. Virtually no users should ever need it, because it only applies to the situation that you're loading in a worker yet you don't want to make a worker.

The reason this has dragged out so long is that every contribution you've made has broken everything other than your own use case, and I cannot and have no interest in testing your bundler use case, because bundling libav.js is not a good idea. Every time I've attempted to repair your code to allow your use case but not break everything else, it's taken an hour and all I've achieved is finding more weird behaviors and then asking about them here. I want you to understand that it is your use case that's bizarre, and upending the entire loading system of a library that works fine is not OK. I would be much happier with your contribution if the whole getFiles mechanism was an optional feature that's only used if it's needed, so that all the normal use cases that already exist don't radically change just to support a new use case that's not even loading in the conventional sense.

martenrichter commented 1 year ago

noworker is not ever supposed to be part of the LibAV object. It's supposed to be part of the options object passed to LibAV when you load libav.js.

If you are in a worker, and you load libav.js normally, it will load in a worker, no problem. If your bundler has put the .wasm.js file (from Emscripten and post.in.js and extern-post.in.js) into the same environment as the .js file (from libav.in.js), then your bundler screwed up. The .wasm.js code is only supposed to run in the generated worker. LibAVFactory should not exist in the code that loaded the .js file.

I think in this respect the bundler works fine. I first though it was interfering here but it was the other case.

However, you are allowed to ask the .js (libav.in.js) code to load without a worker. In that case, if you're loading the .js file from a worker, then the Emscripten code (.wasm.js) will try to set up the worker, because it has no way of knowing whether it was loaded directly in a worker or indirectly through importScripts. Thus, there is a mechanism, nolibavworker, which is totally separate from everything else, that allows that very particular situation to be worked around. Virtually no users should ever need it, because it only applies to the situation that you're loading in a worker yet you don't want to make a worker.

That is the case in my use case, since I followed the WebCodecs recommendation to feed the codecs from a worker (https://www.w3.org/TR/webcodecs/#best-practices-developers) and thus I do the same with the polyfill. Question: if this use case is so rare, why do you force it here:

https://github.com/ennuicastr/libavjs-webcodecs-polyfill/blob/21e3064974f4053550a07380abec4c6da29e8e40/src/rendering.ts#L50 this is causing trouble. If one would made loading these functions optional.

The reason this has dragged out so long is that every contribution you've made has broken everything other than your own use case, and I cannot and have no interest in testing your bundler use case, because bundling libav.js is not a good idea.

Well, I have a very big project with a bundler (so many files no bundling would not be an option, and also with years of development), so I have no option to do it without a bundler. Even though I really hate it, I have run so many times into problems. However, I am not sure if bundling at all is so rare, I think many people are using it. Anyway, I will try to contribute to the bundling case as well as I can.

Every time I've attempted to repair your code to allow your use case but not break everything else, it's taken an hour and all I've achieved is finding more weird behaviors and then asking about them here. I want you to understand that it is your use case that's bizarre, and upending the entire loading system of a library that works fine is not OK. I would be much happier with your contribution if the whole getFiles mechanism was an optional feature that's only used if it's needed, so that all the normal use cases that already exist don't radically change just to support a new use case that's not even loading in the conventional sense.

I can set it up like this in libav.js.in. The only thing, which can not be made optional is in the post script, since it is immutable. But anyway I have the feeling we are not communicating and coordinating well enough, so I can we improve it.

martenrichter commented 1 year ago

noworker is not ever supposed to be part of the LibAV object. It's supposed to be part of the options object passed to LibAV when you load libav.js.

But wait the code says, LibAV.noworker in the extern-post.js. How is this option passed to the file, that is probably the thing that is broken by the bundler.

Yahweasel commented 1 year ago

Question: if this use case is so rare, why do you force it here:

I did not say that noworker is rare. noworker is reasonable and not uncommon. I said that the particular combination of using noworker yet loading from a worker is rare.

However, I think you've finally just pointed me to the crux of the whole matter. I did not anticipate the possibility that libavjs-webcodecs-polyfill would be running in a worker when this code is loaded. Indeed, it shouldn't be necessary to communicate noworker differently if you're loading in a worker than if you're not. Knowing that, I think the right way to do this is to check in libav.in.ts whether we're already in a worker, and add nolibavjsworker to LibAV in that case, so that extern-post.in.js doesn't attempt to load the worker handler code.

Yahweasel commented 1 year ago

However, I am not sure if bundling at all is so rare, I think many people are using it.

I don't have a problem with bundling in general. I use bundling all the time. But loading libav.js is complicated, because there's not just one libav.js. There are any number of variants (about a dozen included and any number more), times two (wasm or not), times two (simd or not), times two (threads or not), times two (debugging or not, though here just bundling the debuggable one is the right option). libav.js needs to choose the right ones to load based on the particular environment it's being loaded into. A bundler could safely bundle the frontend to this process (libav.in.js/libav-version.js), but not the backend (post.in.js/libav-version.wasm.js). If it's bundling the frontend, then I don't see how bundling complicates anything; it is already possible to specify the base from which to load the backend. If it's bundling the backend, then it's essentially bypassing the frontend.

martenrichter commented 1 year ago

I agree, this is actually the intention of libav.loadingWorker = true, it does more or less the same as nolibavjsworker, what you proposed. So the loadingWorker stuff should be removed, if you introduce the other mechanim.

Yahweasel commented 1 year ago

noworker is not ever supposed to be part of the LibAV object. It's supposed to be part of the options object passed to LibAV when you load libav.js.

But wait the code says, LibAV.noworker in the extern-post.js. How is this option passed to the file, that is probably the thing that is broken by the bundler.

This is the nolibavworker, not noworker. This is the workaround (which, as you pointed out above, doesn't work) that users are supposed to be able to use if (a) they're loading libav.js in a worker and (b) they're loading it with noworker, so that the worker-specific code in extern-post.in.js doesn't fire.

martenrichter commented 1 year ago

noworker is not ever supposed to be part of the LibAV object. It's supposed to be part of the options object passed to LibAV when you load libav.js.

But wait the code says, LibAV.noworker in the extern-post.js. How is this option passed to the file, that is probably the thing that is broken by the bundler.

This is the nolibavworker, not noworker. This is the workaround (which, as you pointed out above, doesn't work) that users are supposed to be able to use if (a) they're loading libav.js in a worker and (b) they're loading it with noworker, so that the worker-specific code in extern-post.in.js doesn't fire.

Oh, yes, I am sorry, I overlooked libav worker in the middle the whole time...... this would have saved some time.

martenrichter commented 1 year ago

If it's bundling the frontend, then I don't see how bundling complicates anything; it is already possible to specify the base from which to load the backend. If it's bundling the backend, then it's essentially bypassing the frontend.

The point, why the changes with the backend are necessary, is that the bundlers do not include files they do not know about. And of the other hand, if they would know about all possibilities, they would include all variants... So I agree it is a total nightmare. And the sad thing is, that every bundler does it a bit differently.... Anyway, here is the getFiles implementation I use with my bundler:

    const getFiles = (conf) => {
      const version = conf.ver
      // define a similar function, adapted to your bundler, probably you will ignore base
      const t = conf.t

      if (conf.ext === 'wasm') {
        return new URL(
          `../node_modules/libav.js/libav-${version}-opus.${t}.wasm`,
          import.meta.url
        )
      } else if (conf.ext === 'import') {
        return import(
          `../node_modules/libav.js/libav-${version}-opus.${t}.js`
        )
      } else {
        return new URL(
          `../node_modules/libav.js/libav-${version}-opus.${t}.js`,
          import.meta.url
        )
      }
    }

Hardcoded for a specific variant. But not universal, because of import.meta.url, which is specific to modules. Something like this could end up as example in the docs.

martenrichter commented 1 year ago

As I know that noworker != nolibavworker everything works ....., sorry, I have to learn to read.

martenrichter commented 1 year ago

So I think I have found a way to the purpose less invasive. I am not ready yet, so commit will come later this week. So just a short outline: The more of the old code will stay intact. Changed url are passed through the LibAV object and replace if they are defined things such as toImport. In extern.post.js locateFile is used if a wasmurl is provided. The only thing, that has to stay, is that a config message is passed to the worker to inform about the url and alternative Module import code, the rest is only executed, if the parameters are set on the LibAV object. But commit will come later it is not completely done yet.

martenrichter commented 1 year ago

I hope the new version is as less invasive as I thought. (You see the diff is much shorter, but it uses LibAV for passing stuff). For reference, this is the way a bundler can supply replacements for all files:

 if (!globalThis.LibAV)
      globalThis.LibAV = { nolibavworker: true } // the imported lib needs this, if it is in a worker
    LibAVlib = await import(
      '../node_modules/libav.js/libav-3.11.6.0-opus.js'
    )
    const target = globalThis.LibAV.target()
    globalThis.LibAV.wasmurl = new URL(
      `../node_modules/libav.js/libav-${globalThis.LibAV.VER}-opus.${target}.wasm`,
      import.meta.url
    ).href
    globalThis.LibAV.toImport = new URL(
      `../node_modules/libav.js/libav-${globalThis.LibAV.VER}-opus.${target}.js`,
      import.meta.url
    ).href
    globalThis.LibAV.importedjs = await import(
      `../node_modules/libav.js/libav-${globalThis.LibAV.VER}-opus.${target}.js`
    )

Here it is hard coded for one variant, so the bundler only includes this variant. This is an example of a vite setup, but could be adapted to any bundler, I assume.

Yahweasel commented 1 year ago

I removed importedjs because it's frankly pointless. Checking for the presence of LibAVFactory is the same check.

I would request that you please make a separate PR modifying README.md and describing how to use this in a bundler/with ES6 modules. I can't document that, because I'm not using it and don't fully follow, so right now, this is essentially code that benefits only your particular configuration. Don't worry if you're not 100% confident in your English grammar or similar; as I think is perfectly clear, it'll go through my editorial review before import regardless ;)

martenrichter commented 1 year ago

I removed importedjs because it's frankly pointless. Checking for the presence of LibAVFactory is the same check.

Sure, I had the same thought, but it was good to point out, that it had to be loaded externally.

I would request that you please make a separate PR modifying README.md and describing how to use this in a bundler/with ES6 modules. I can't document that, because I'm not using it and don't fully follow, so right now, this is essentially code that benefits only your particular configuration. Don't worry if you're not 100% confident in your English grammar or similar; as I think is perfectly clear, it'll go through my editorial review before import regardless ;)

Sure I will do a PR for the readme. And I am not afraid of grammar. It is just a matter of the number of iterations before publication.