[native_assets_cli] Dart API interface per asset type

dart-lang / native

Dart packages related to FFI and native assets bundling.

BSD 3-Clause "New" or "Revised" License

83 stars 27 forks source link

[native_assets_cli] Dart API interface per asset type #994

Open dcharkes opened 2 months ago

dcharkes commented 2 months ago

Make package:native_assets_cli only consume an API that shows getters for native code (and not any getters for Java or other asset types). This can be achieved by

nesting NativeBuildConfig inside BuildConfig which doesn't work well with the shared fields such as outputDirectory, or
BuildConfig implements NativeBuildConfig where only a subset of the getters is visible, or
an extension type NativeBuildConfig on BuildConfig.

Make package:native_toolchain_c add assets to a NativeBuildOutput that doesn't have methdods/setters related to Java assets or data assets. This can be achieved by

BuildOutput implements NativeBuildOutput and NativeBuildOutput.addAsset takes NativeCodeAsset instead of Asset.
an extension type.

We could even have assetId be optional for some asset types (jars) in the API.

Question: Don't we ever have builders that would like to add more than one asset type? They would need to take the full BuildOutput.

https://github.com/dart-lang/native/issues/853

Sister issue for the JSON protocol:

https://github.com/dart-lang/native/issues/993

dcharkes commented 2 months ago

Nesting the assets inside an asset type in the API has consequences for how a link.dart is structured.

Having it nested means that older link.darts are not aware of new asset types (and will ignore them silently). Ignoring silently would be weird because we would specify that an asset is destined for a certain link script. Having one list of assets (with an asset type per asset) requires explicit switching in a link.dart script, which requires developers to deal explicitly with possible new asset types.

So, @mosuem and I believe it's better to have a single list of assets.

mkustermann commented 2 months ago

Having it nested means that older link.darts are not aware of new asset types (and will ignore them silently).

I don't understand this at all. One only uses link.dart for specific asset types

I'd view our system as a layered architecture:

application code can use/interact/... with an asset via an asset-API (e.g. using declarative @Native FFI api, using a getAssetContents() in dart:assets API, loadImage() in dart:ui, etc) => we'd have one asset type per asset-API => our tree shaker emits information about usages of those asset-APIs
the build.dart scripts can produce those assets (there'd be one asset type per mechanism above, i.e. one for files, one for images) => in JIT mode there's no linking happening (?) so those assets can be used directly => in AOT mode we allow a linking step (which takes advantage of tree shaking, etc)
the link.dart scripts can take in assets of a specific asset type / asset-API and tree shake them, combine them, etc. => e.g. it can tree shake localization messages it doesn't need, C functions that aren't needed, it can rewrite images to from svg to different format, etc.
the bundling system supports, for each asset-type, bundling linked and unlinked assets and puts them where the runtime can find them
the runtime, for each asset-API, knows how to handle linked and unliked assets (e.g. getAssetContents() may load the contents from disc in unlinked mode, but may have it embedded in the AOT-compiled app in linked mode, ...)

It probably makes sense for there to be one linker per asset-API/asset-type (imagine C linker: it combines all the native code into one .so file, imagine localization messages: it combines the localizations from all packages into one big one). The pubspec version constraints on the linker can ensure the version of linker supports the version of the asset-API/asset-type.

One way to look at it is a map-reduce system: All emitted assets by build.dart (the map phase) are grouped by asset-API/asset-type and get their corresponding link.dart (the reduce phase) invoked. The link.dart (reducer) may only produce 1 asset but may also produce multiple.

dcharkes commented 2 months ago

It probably makes sense for there to be one linker per asset-API/asset-type

Conceptually yes, but the question is how to make this work nicely.

Suppose there are two packages that have a link.dart that wrap a C linker, or that know how to deal with some reusable localization format. If an app has transitively two packages that treat the same asset type, we get into some questions. E.g. do we just fail the build? How do we even know what asset types are supported by a linker. The link.dart and build.dart protocol is single invocation. So you'd have to send all asset types to all link.darts.

To avoid these issues, @mosuem and I thought it would make sense to have asset-types conceptually namespaced by package name. So instead of the asset-type determining to which link.dart a to-be-linked-asset is send, we'd declare it in the protocol with the package name:

# build_output.yaml/json
assets:
  - # immediately bundled
assets_for_linking:
  native_toolchain_c:
    - # an asset being sent to native_toolchain_c tool/link.dart for linking

The downside of namespacing asset types with package names is that we can't really do drop-in-replacements of linkers. E.g. if someone comes up with a better JSON minifier, every build.dart outputting json's would need to update to send their assets to be linked to the new and shinier link.dart of the new package.

So from a map-reduce point of view:

does the build.dart output declare to which link.dart an asset is sent, or
does the build.dart just output some key, all assets are sent to all link.darts, and link.darts should ignore assets that are not their own asset type, and things go horribly wrong when two link.darts consume the same asset type.

Map reduce works with the first approach, if I understand correctly. The comment was written with assuming this approach.

If we both have a concept of targetLinker: <package_name> and Asset.type the it could be that someone sends an asset of some asset type to a linker, and that linker doesn't know about that asset type at all. That was what my comment was about. Does that make sense?

mkustermann commented 2 months ago

all assets are sent to all link.darts, and link.darts should ignore assets that are not their own asset type, and things go horribly wrong when two link.darts consume the same asset type.

Definitely not.

There's multiple options:

We could make the build.dart script not only output the asset, but also the linker to use (as you say). Then the map-reduce would group by (asset-type, linker), the reducer/linker would get a list of those assets that specified it as the linker. => The package with build.dart would put then the linker it wants to use in pubspec.yaml dependency, using a version that supports that asset type. So no issue regarding versioning / linker not supporting an asset type.
We could make the application package decide which linker to use for which asset type (e.g. an application may say: for all svgs I want to transform them in a certain way). => The application package would then depend on a linker in pubspec.yaml and ensure that linker supports the asset types it configures it to link. So no issue regarding linker not supporting an asset type.
We could make the bundling tool itself decide which linker to use for which asset type (e.g. dart build / flutter build will invoke the android C linker for all the static libraries it got from the build.dart files)

One could do a combination:

If the application configures a linker, it takes precedence over any other setting (it's reducer will get all assets of the configured type)
Otherwise, if the build.dart configured a linker to be used with an asset, we use that one
Otherwise, we use the bundling tool's version
If the bundling tool doesn't have one, no linking happens (which is ok, as some build.dart may just have a file they want to include, no linking needed)

dcharkes commented 2 months ago

I like the combination option.

I'd need to spend a bit more time thinking about some of the specifics.

If we have temporary asset types (e.g. a .o file or something, it must be consumed by a linker, it cannot be not be linked.)
Some types of linking might only make sense from the bundling tool point of view. E.g. the bundling tool does kernel compilation, and kernel to machine code compilation. And statically linking native code into the dart-aot-snapshot can only happen there. Also embedding a data asset (as a base 64 string) can only happen in kernel compilation. (Let's say if someone want native code assets and data assets but really really only wants one file instead of a bundle for some reason.) Then the build.dart-configured linker should probably not take precedence over the bundling tool linker.

But in general I think this a good approach.

For our first use cases, I think the build.dart-specified-linker suffices. And then we can later extend it.

(Side note: These considerations are more for https://github.com/dart-lang/native/issues/153. Not really what this issue was about.)

mkustermann commented 2 months ago

If we have temporary asset types (e.g. a .o file or something, it must be consumed by a linker, it cannot be not be linked.)

On the lowest level each bundling & runtime system (flutter and dart) will have a fixed set of asset-APIs it supports. So if

we're in JIT mode and not linking (?) all assets emitted by build.dart need to be of one of the fixed types => So the bundling tool will issue an error if there's any emitted assets that we don't support
we're in AOT mode and perform linking we may allow build.dart to emit an extended set of assets (or arbitrary assets -e.g. with mime type?) but expect the emitted assets of link.dart to be of the fixed set that's supported by the bundling tool => So the bundling tool will issue an error in link phase if there's any emitted assets that we don't support.

An interesting thought experiment would be to see how one could make custom asset-APIs that neither Dart / Flutter know about which then get lowered to the ones that the bundling tool support:

A package may support an asset API: e.g. package:animation provides loadAnimation() API => A package:animation_cli_build can be used for build / linking.
Users of that package package:foo may have some animation files => That package's hook/build.dart will use package:animation_cli_build and give it the file names.
The package:animation_cli_build will a) in non-linking mode: create specially crafted file assets (supported by dart:assets getAssetContents() API) b) in linking mode: create a animation-asset types and specifies the package:animation_cli_build linker. => The linker will consume all animations, tree shake those that aren't used by the app (every linker gets resource information file), optimize them to a different format and emit one big file with special name
The runtime system in package:animation will know whether it runs in AOT or JIT mode => In AOT mode it will use the getAssetContents() API to load the single optimized animations file containing all animations => In JIT mode it will use the getAssetContents() API to load individual asset-ids (which the package:animation_cli_build produced)

If we can make this work we have a general mechanism that

allows packages to define asset APIs
allow the building/linking to use user-defined asset kind & transformations that lower to the APIs we have in dart/flutter
allows the runtime system of the package to use the lower-level APIs we have in dart/flutter to load assets for the higher-level concept of their package

dcharkes commented 2 months ago

we're in JIT mode and not linking (?) all assets emitted by build.dart need to be of one of the fixed types => So the bundling tool will issue an error if there's any emitted assets that we don't support

I was thinking we would execute link.dart scripts in JIT mode, but it would not have the AOT-treeshaking information.

we're in AOT mode and perform linking we may allow build.dart to emit an extended set of assets (or arbitrary assets -e.g. with mime type?) but expect the emitted assets of link.dart to be of the fixed set that's supported by the bundling tool => So the bundling tool will issue an error in link phase if there's any emitted assets that we don't support.

Yes that's the idea.

Now that we have asupportedAssetTypes in the BuildConfig (and LinkConfig), we can even support a different set of asset types whether we're in JIT or AOT. We'd just emit a different list in the BuildConfig.

An interesting thought experiment would be to see how one could make custom asset-APIs that neither Dart / Flutter know about which then get lowered to the ones that the bundling tool support: [...]

I think it would make it simpler if we always run the linking step so that this package would always emit the same format. Then it's runtime doesn't have to branch on JIT/AOT.

(Side note: This sounds exactly like the use case mentioned in https://github.com/flutter/flutter/issues/143348.)

If we can make this work we have a general mechanism that

allows packages to define asset APIs

allow the building/linking to use user-defined asset kind & transformations that lower to the APIs we have in dart/flutter

allows the runtime system of the package to use the lower-level APIs we have in dart/flutter to load assets for the higher-level concept of their package

Yep, that's the idea! 👌

mkustermann commented 2 months ago

I was thinking we would execute link.dart scripts in JIT mode, but it would not have the AOT-treeshaking information.

For some things no linking will be needed (e.g. readily available .so file, just include a file that can be accessed at runtime) So at least for those asset kinds for which no linker was specified (neither at per-package, per-app or built tool level) no linking needed. Then there's the question whether there's valid use cases where a linking step is required when a) we don't have tree shaking information b) we want to run app as fast as possible (development cycle) and not "optimize" any assets. Do we have valid use cases for this?

(Side note: This sounds exactly like the use case mentioned in https://github.com/flutter/flutter/issues/143348.)

Yes. Stay tuned about this - working on that part!

dcharkes commented 2 months ago

Then there's the question whether there's valid use cases where a linking step is required when a) we don't have tree shaking information b) we want to run app as fast as possible (development cycle) and not "optimize" any assets. Do we have valid use cases for this?

I'm thinking that it's a required step for the svg compiler mentioned in that issue.

cc @mosuem all the above thoughts.

mkustermann commented 2 months ago

I'm thinking that it's a required step for the svg compiler mentioned in that issue.

Svgs can be parsed & displayed at runtime or can be pre-processed to something else (e.g. a bunch of triangles with shading information - which may take long time) and that something else can be loaded & displayed.

Also the build.dart can do the svg processing as well, you don't need a linker step to do it.

We may want to communicate to build.dart whether we're in development mode or not (which we indirectly also do e.g. if we tell it to produce .so files or static library .a files).

If there's a real need we can of course support running the linking in development mode as well, I just fear that it may be misused to do a lot of work where it will harm development cycle.

dcharkes commented 2 months ago

Also the build.dart can do the svg processing as well, you don't need a linker step to do it.

That requires the build.dart of the user app to invoke some compilation from package:vector_image's dart API. Instead of having package:vector_image having a link.dart that processes all of them. And that would the only work for SVGs from the root package. If you have a helper package, that helper package would need to decided whether it compiles the SVGs themselves (preventing any tree-shaking) or whether it outputs them to be linked. How did you envision having build.dart doing it in such context?

We may want to communicate to build.dart whether we're in development mode or not (which we indirectly also do e.g. if we tell it to produce .so files or static library .a files).

BuildMode.debug?

(We currently don't have a concept of develop vs release in Dart standalone. Should all JIT be considered development mode?)

If there's a real need we can of course support running the linking in development mode as well, I just fear that it may be misused to do a lot of work where it will harm development cycle.

Hm, that's indeed something to consider.

mkustermann commented 2 months ago

That requires the build.dart of the user app to invoke some compilation from package:vector_image's dart API. Instead of having package:vector_image having a link.dart that processes all of them. And that would the only work for SVGs from the root package. If you have a helper package, that helper package would need to decided whether it compiles the SVGs themselves (preventing any tree-shaking) or whether it outputs them to be linked. How did you envision having build.dart doing it in such context?

Somewhat as described above: If I have svgs in my package, then I need to tell the system my package needs those svgs:

// hooks/build.dart
import 'package:svg_cli_build/svg_cli_build.dart';

main(args) async {
  await runBuild((config, output) {
     SvgBuilder('package:mypackage', ['icons/a.svg', 'icons/b.svg']).build(config, output);
  });
}

In my package (doesn't have to be root package) I then do

// package:foowidget/foowidget.dart
import 'package:svg/svg.dart';

class FooWidget {
  ... = loadSvgApi('package:mypackage', 'icons/a.svg');
}

Now package:svg_cli_builds SvgBuilder may

in development mode: just add a bytes-asset (which flutter/dart bundler bundles & runtime system support supports with loadAssetContent) - but it could also shrink/optimize/transform them
in aot mode: emit an svg-asset-type and a linker pointing to package:svg_cli_build linker (which can e.g. combine all svgs together into one big file, ...), the linker will emit a file asset (just as before)

Now package:svgs loadSvgApi

in development mode: just load the bytes-asset via low-level loadAssetContent
in aot mode: load the giant combined file via, use the argument to loadSvgApi to find which part of that big file to load, and load it

i.e. we have a higher-level concepts

higher-level build.dart API (SvgBuilder)
higher-level runtime API (loadSvgApi)

that under the hood rely on lower level things supported by dart/flutter build/bundle/runtime.

In some sense this is very natural: The package that knows how to e.g. compile C code probably also knows how to link it. The package that provides a intl/l18n API probably knows how to tree shake the intl/l18n files. So it can have a package for the compile-time component (build/link) and one for runtime - they can possibly even be the same.

dcharkes commented 1 week ago

We could even have assetId be optional for some asset types (jars) in the API.

Currently, Asset has a non-nullable assetId. Which makes sense for data assets and native code assets as they both are accessed from Dart code through asset id. It's unlikely that we would access Jar assets via an asset id ever. So we might want move assetId into code asset and data asset. (Or we make id optional, like file already is.)