Pyodide interface - Githubissues

winniederidder commented 2 years ago

The function pyodideExpose will pass along a Pyodide object returned by the pyodidePromise. However, this object is missing many properties from the actual Pyodide object. This should be extended to better reflect its capabilities and make it easier to use in code. Example properties:

/**
 * Runs a string of Python code from JavaScript.
 */
runPython: (code: string, globals?: any) => any;
/**
 * Runs Python code using PyCF_ALLOW_TOP_LEVEL_AWAIT.
 */
runPythonAsync: (code: string) => Promise<void>;
/**
 * Inspect a Python code chunk and use pyodide.loadPackage()
 * to install any known packages that the code chunk imports.
 */
loadPackagesFromImports: (code: string) => Promise<void>;
/**
 * Load a package or a list of packages over the network.
 */
loadPackage: (names: string | string[]) => Promise<void>;
/**
 * An alias to the global Python namespace.
 */
globals: Map<string, any>;

alexmojaki commented 2 years ago

Pyodide itself was recently ported to TS. I'm hoping that with the release of 0.20.0 I'll be able to use their own type declarations from npm.

In any case, you ultimately shouldn't need any of these properties.

runPython, runPythonAsync, and globals are fine but will probably be replaced by uses of pyimport if you move your own Python code into a package archive, which will allow you to use multiple files.
loadPackage("micropip") and micropip.install should be replaced by having your dependencies included in the package archive.
loadPackagesFromImports should be replaced with pyodide_worker_runner.install_imports (see the end of the README)

winniederidder commented 2 years ago

Can this package archive supports things such as downloading FriendlyTraceback and using it in a delayed fashion? Currently, the package is installed and awaited when needed. With the package, I assume that the download would be done up front and extracting the package into Pyodide to then not be too expensive to make this feasible? Is downloading from an url feasible when using it in an npm-package, which is how Papyros is used in Dodona (i.e. the tar must be served from somewhere, unless a local file can somehow be used and bundled?)

alexmojaki commented 2 years ago

With the package, I assume that the download would be done up front and extracting the package into Pyodide to then not be too expensive to make this feasible?

Yes, the download happens in parallel to loading Pyodide, which generally takes longer, so overall startup time should be reduced significantly even though it'll wait for everything like friendly_traceback to be installed first. The actual extraction is fast.

Side note: installing friendly_traceback from PyPI includes a lot of translation files that you don't need. In futurecoder, the package is tweaked to exclude the unused languages, which I recommend you do as well eventually.

Is downloading from an url feasible when using it in an npm-package, which is how Papyros is used in Dodona (i.e. the tar must be served from somewhere, unless a local file can somehow be used and bundled?)

Good question, I've never had to do anything like that before. I assume that the archive file can be included in the dist folder which is published to npm, and that e.g. webpack can import it as a URL.

Another possibility would be to extend this library a little to support downloading multiple files and then download each dependency straight from PyPI. But that won't let you use e.g. pylint.

winniederidder commented 2 years ago

Using the tar seems possible in a published package. It does increase the size of the package significantly and requires updating the HTTP connect-src to allow data: urls. Restring the scope of these data urls seems to be a bit tricky, so that's en open issue. In its entirety it seems to be working.

alexmojaki commented 2 years ago

Surely you don't want to use data: URLs? doesn't that effectively mean a giant JS file?

winniederidder commented 2 years ago

Surely you don't want to use data: URLs? doesn't that effectively mean a giant JS file?

How else do you load in the tar? The function loadPyodideAndPackage expects a url to extract from. Isn't this also what you use in futurecoder to obtain the tar, using webpack url-loader?

alexmojaki commented 2 years ago

Yes but it actually fetches the file over HTTP. A data: URL means storing the entire file in JS in a base64 string. url-loader should only produce those when the file is small.

winniederidder commented 2 years ago

It generated a data url for the package I built using the scripts you looked at already. Maybe I need to configure url-loader somehow?

alexmojaki commented 2 years ago

Looking at https://v4.webpack.js.org/loaders/url-loader/, it's actually a bit confusing

A loader for webpack which transforms files into base64 URIs.

url-loader works like file-loader, but can return a DataURL if the file is smaller than a byte limit.

So it's not clear to me if the data url part is optional or not.

In any case, try using either file-loader or asset modules.

I'm not actually sure what's happening in futurecoder, the webpack config is hidden in CRA. The import just says:

import pythonCoreUrl from "./python_core.tar.load_by_url"

You should probably also avoid the .tar suffix as it breaks things for users with the IDM extension: https://github.com/alexmojaki/futurecoder/issues/185

winniederidder commented 2 years ago

Most likely I will need to stick to a data url. When publishing packages over npm, "actual" URLs always cause issues, which is also why the workers are still bundled using worker-loader. Otherwise they are fetched from the server where they are not served (because they are in node-modules), so they need to be inlined somehow. It sounds the safest to stick to this working version unless a really good reason to change pops up with a reliably working alternative. I've been trying to get it to work properly using asset-modules but it doesn't look worth the efforts.

alexmojaki commented 2 years ago

Wow, I didn't even know that bundling a worker like that was possible.

Using a data url sounds like a disaster. It's bad enough that it will have to download the whole thing before the worker can start. But since your worker is inlined as well, that means that even your main JS will contain the whole tar. And I think it'll be base64 encoded twice, which is even worse. I see your index.js on https://papyros.dodona.be/ is apparently 4MB even with gzip.

It should absolutely be possible for users of the package to include the tar file in their application somehow. I expect that webpack would automatically copy the file from node_modules into the final distribution when using file-loader or something. Beyond that, it's really not that hard to just copy the file.

Another option is hosting the .tar file publicly.

Finally, npm package users may want to customize the tar, e.g. to preload extra dependencies. So it might even be best to just provide them with what they need to build the tar themselves.

winniederidder commented 2 years ago

Closing in favor of #8

alexmojaki / pyodide-worker-runner

Pyodide interface #1