Allow adding custom data to package_config.json through pubspec.yaml config

jakemac53 commented 1 year ago

Proposal

The package_config.json file permits storing extra data in both package entries as well as top level fields, but there is no general way to utilize this today, because pub owns the generation of the config and doesn't support adding in extra data.

I would like to propose that we add a way to configure extra data for a package entry into the pubspec.yaml file, something like:

name: <package>
package_config:
  <some-key>: <some-value>

This would just be added directly to the extra data for the package config entry for the current package when it is generated. We could bike shed on the actual name of the key later, if we can agree this is a good idea :).

This would allow for a generalized approach to package configuration, and it would both unify and simplify how various packages/tools accept configuration on a per-package level.

Use cases

One example use case would be the formatter, cc @munificent . People could configure their formatter settings via pubspec.yaml, and then the formatter only has to read a single configuration file (package_config.json). While this is some work to do, it would also enable the formatter to be more future proof (and generally correct) regarding language versions. It also allows it to maintain minimal coupling to tools like pub - internally for instance we could also support generating package configs with this extra information.

Another use case could be package:build - instead of having a new file (build.yaml), which we have to transitively look for in all packages, we could have just supported configuration via this option.

Concerns

While this configuration is mostly a black box to pub, it will have to make a decision regarding invalidation. At a minimum, the package_config.json file will have to be regenerated whenever this config changes for any transitive dep (most of these will never have a change because they will be pub deps). If any compilers start accepting configuration in this way, then we would also have to make sure we invalidated compilation of pub binaries (which might require deeper changes in frontend_server etc).

I think pub can mostly ignore this, and rely on frontend_server to do the heavy lifting. It should re-invoke it if there is any change to the config, and assume frontend_server will handle the invalidation appropriately.

jakemac53 commented 1 year ago

cc @natebosch

natebosch commented 1 year ago

Should we have a convention around using a single top-level key to namespace a chunk of config? Should we be worried about potential conflicts?

I worry a bit that this becomes another option amongst many for this type of config.

jakemac53 commented 1 year ago

Should we have a convention around using a single top-level key to namespace a chunk of config? Should we be worried about potential conflicts?

We should have a convention that packages accept configuration under their package name. Packages that don't do that might run into conflicts, but it should be fairly rare even then.

I worry a bit that this becomes another option amongst many for this type of config.

Obligatory https://xkcd.com/927/

It is a valid concern but I think this is also significantly better than the typical solutions which require each tool to read the pubspec itself (and potentially transitive pubspecs).

sigurdm commented 1 year ago

Not sure I understand the proposal completely - but I think I like the idea :)

In the formatter example, is the package_config: a top-level entry in the pubspec.yaml of the formatter or of the consuming package? Could you spell out the example?

Are these present only in the root-package?

Another use-case for having a way to add stuff from pubspec.yaml to package_config.json would be a way of specifying a "generated package" (like flutter_gen) not sure if it is the right mechanism - but there might be something to do here.

sigurdm commented 1 year ago

@lrhn might have things to say.

jakemac53 commented 1 year ago

In the formatter example, is the package_config: a top-level entry in the pubspec.yaml of the formatter or of the consuming package? Could you spell out the example?

Well, I was being a bit intentionally vague here as to leave it open ended to the details, I don't love package_config as a top level key (most users don't know what that is). But yes, the general idea is a top level key containing extra data to be added to the package config.

pubspec:

name: my_package
environment:
  sdk: ">=3.0.0 <4.0.0"
package_config:
  sweet: ["config"]

package config:

{
  "configVersion": 2,
  "packages": [
    {
      "name": "my_package",
      "rootUri": "../",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "sweet": ["config"]
    }
  ]
}

Are these present only in the root-package?

These would be supported for all transitive packages (that is actually an important part of the value). This means only one tool actually has to crawl through and read files from all transitive packages (pubspecs, which they are already reading anyways), and then all other tools just read the one file which has everything merged in already.

Another use-case for having a way to add stuff from pubspec.yaml to package_config.json would be a way of specifying a "generated package" (like flutter_gen) not sure if it is the right mechanism - but there might be something to do here.

I could see that yes, not sure what the configuration would exactly look like though.

munificent commented 1 year ago

One example use case would be the formatter, cc @munificent . People could configure their formatter settings via pubspec.yaml, and then the formatter only has to read a single configuration file (package_config.json).

Aside from not having to deal with YAML, what's easier about reading the package_config.json versus the pubspec.yaml file?

natebosch commented 1 year ago

what's easier about reading the package_config.json versus the pubspec.yaml file?

I suspect it primarily gives a single place to read, instead of having to hunt for transitive pubspec.yaml files through dependencies.

Do we need to consider the impact on parse time for this file? If we starting allowing arbitrarily large content could it have an impact on compile times or VM startup time?

jakemac53 commented 1 year ago

Do we need to consider the impact on parse time for this file? If we starting allowing arbitrarily large content could it have an impact on compile times or VM startup time?

It would still be bounded by the size of transitive pubspec.yaml files, which should remain reasonably small, although its certainly possible to abuse it to the point that it causes a problem.

jakemac53 commented 1 year ago

It did just occur to me that dart_style is probably unlikely to gain much benefit from this transitive merging, since it is not actually ran on the transitive import graph of all packages - probably 99% of the time it is a single file or package.

In the case where it is formatting multiple packages, it likely doesn't even do the right thing:

In a parent dir: It ends up having to read package_config.json files for all the nested dirs.
Nested but unrelated packages: For instance the example dir is often its own package. It would have to separately read that package_config.json file.

That combined with the fact that it won't work if you haven't ran a pub get in all those nested packages.... bleh.

munificent commented 1 year ago

I suspect it primarily gives a single place to read, instead of having to hunt for transitive pubspec.yaml files through dependencies.

I don't think that would benefit the formatter much since it would end up looking for a config file for each file being formatted anyway. The formatter doesn't know what packages are, and it's entirely reasonable to run it on arbitrary collections of directories and files that may be within a package, span multiple packages, not be inside a package at all, be in a package that hasn't had pub get run yet, or even only exist in memory.

jonasfj commented 1 year ago

I could imagine something like:

# in pubspec.yaml
config:
  <package>:
    <key>: <value>

  # Example configuration for package:retry
  retry:
    default_retries: 5

Then in package_config.json:

{
  "configVersion": 2,
  "packages": [
    {
      "name": "retry",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "config": {
        "default_retries": 5,
      }
    }
  ]
}

Then we could allow code to read configuration either by parsing package_config.json or loading it from environment, like: int.fromEnvironment('package.retry.config.default_retries').

This would allow for conditional imports, which I'm sure could be a footgun. On the other hand it would enable some nice features. Maybe it could configure which backend to use for package:http.

Or it could just generally be used for writing configuration values without having yet-another _options.yaml file cluttering the project root.

Not sure how serious I am about this :D

dcharkes commented 1 year ago

Cross-linking from the ideas in https://github.com/dart-lang/native/issues/39:

We should consider a script instead of a yaml config as it should support conditional configuration. (E.g. only including something in debug mode, or when some flag is set.)

For localization, we need something that combines values. Moreover, we need something that is aware about the packageRoot of where the config is defined

# pubspec.yaml package:foo
config:
  localization:
    messages:
      - data/messages.json

# pubspec.yaml package:bar
config:
  localization:
    messages:
      - data/messages.json

{
  "configVersion": 2,
  "packages": [
    {
      "name": "localization",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "config": {
        "messages": [
          "${packageRootPackageBar}/data/messages.json",
          "${packageRootPackageFoo}/data/messages.json",
        ]
      }
    }
  ]
}

Having a script (such as config.dart) get access to the packageRoot and rootPackageRoot can easily add these paths.

The two above constraints lead me to think that a script for config would be better than yaml entries.

If there is a more general want/need for config than only in the context of build.dart (https://github.com/dart-lang/native/issues/39), I could be pursuaded we should have a top-level config.dart rather than a toplevel build_config.dart.

cc @mosuem

jakemac53 commented 1 year ago

The custom config would live under the package config for the package in which it was written, so you would have the package root information, something more like this in your example:

{
  "configVersion": 2,
  "packages": [
    {
      "name": "bar",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "config": {
        "messages": [
          "data/messages.json",
        ]
      }
    }, {
      "name": "foo",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "config": {
        "messages": [
          "data/messages.json",
        ]
      }
    }
  ]
}

I think that would work fine (but of course you would have to iterate all packages in the config looking for the messages). In general I think it is better design though as you have all the context about the surrounding package with each config entry.

dcharkes commented 1 year ago

Right, but then still you'd probably want to nest under the "target" package:

{
  "configVersion": 2,
  "packages": [
    {
      "name": "bar",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "config": {
        "localization" : {
          "messages": [
            "data/messages.json",
          ]
        }
      }
    },
    {
      "name": "foo",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "config": {
        "localization" : {
          "messages": [
            "data/messages.json",
          ]
        }
      }
    }
  ]
}

In the context of native assets, one might want to branch on target OS or build mode (debug/release) for certain config. So in that scenario, the config needs to be computed on a per dart run/flutter run/flutter build basis. For that reason, putting the output in the packages_config.json seems like a bad idea. The packages_config.json is not dependent on target OS etc. (E.g. we have feature requests of making resolution OS dependent: https://github.com/dart-lang/pub/issues/3932).

build.dart (and the proposed build_config.dart) run as a target-aware step, enabling this OS/build-mode dependent config.

Regarding caching/invalidation, in build.dart invocations we already take care of this.

This makes me think there are multiple different type of configurations we are talking about here:

Dart defines -Dkey=value, which influence the Dart compilation and are accessible in the Dart code. (We might benefit from some kind of config in pubspec so that these defines do not have to be re-provided every invocation.)
Config for scripts in JIT mode (analyzer, ...) only available on the host machine. These scripts can access the packages_config.json or pubspec.yaml. (Shipped apps in AOT cannot access these, so this type of config is not available in shipped apps.)
"build config": also only available on the host in JIT mode, but in addition: conditional config based on build-mode/OS/... (which requires it to be a script rather than yaml).

With the design we had in mind for the build config, we would provide an API in the CLI we already have, so that build.dart script writers do not need to traverse the combined package_config.json or multiple pubspec.yamls. Also, we'd only make the config available to direct or transitive dependencies. If we'd want developers to be able to achieve the same within the package_config.json, they'd need to construct the actual dependency graph to filter instead of blindly going over all packages. https://github.com/dart-lang/pub/issues/3795

@jakemac53 reading from your initial proposal, you were not suggesting option 1. right? Only option 2.

Still 2 and 3 seem quite closely related. If we'd drop the conditional config (which would probably mean people start writing some kind of conditionals in the config themselves: verbose_logs: {debug: true, release: false} and then the consumer of the config interprets that), then we could potentially serve both use cases for 2 and 3 with the same mechanism.

One argument for not unifying 2 and 3 is caching. If the solution proposed in this issue (a) contains config for build.dart invocations, but also for other things, and (b) is not filtered by direct dependencies, then the Dart and Flutter builds have to invoke all build.dart scripts for any changes to the config. The solutions proposed in https://github.com/dart-lang/native/issues/39 do not have these issues. That being said, maybe it's quite unlikely for the config to change.

jakemac53 commented 1 year ago

In the context of native assets, one might want to branch on target OS or build mode (debug/release) for certain config. So in that scenario, the config needs to be computed on a per dart run/flutter run/flutter build basis. For that reason, putting the output in the packages_config.json seems like a bad idea.

Yes, the package config should be usable across configurations. I would do what you alluded to later, encode in the config what to do for each mode/configuration. This is what people do in build.yaml files for instance. Whatever tools are consuming this configuration would handle selecting the correct config for the current invocation.

@jakemac53 reading from your initial proposal, you were not suggesting option 1. right? Only option 2.

Yeah I think that -D defines are typically invocation specific config, and also global, so I am not sure the pubspec is a good place for it. I also don't think all tools should have to start reading pubspecs (and definitely not transitive ones) in order to find defines.

2. (Shipped apps in AOT cannot access these, so this type of config is not available in shipped apps.)

I agree this isn't a viable option for shipped apps, but it is viable for build-time (or general development) configuration.

If the solution proposed in this issue (a) contains config for build.dart invocations, but also for other things, and (b) is not filtered by direct dependencies, then the Dart and Flutter builds have to invoke all build.dart scripts for any changes to the config.

Yes this is likely accurate, as the config could change the behavior. I wouldn't expect the config to change often.

dcharkes commented 1 year ago

Yeah I think that -D defines are typically invocation specific config, and also global, so I am not sure the pubspec is a good place for it.

I was thinking that for build.dart some configuration is better to provide on a per-run basis, but other is nicer to put in a file. Therefore I suggested adding support for both in https://github.com/dart-lang/native/issues/39#issuecomment-1740458343 the CLI args overriding the config in the pubspec if both provided. (In the same spirit as https://github.com/dart-lang/tools/tree/main/pkgs/cli_config)

jakemac53 commented 1 year ago

Therefore I suggested adding support for both in dart-lang/native#39 (comment) the CLI args overriding the config in the pubspec if both provided.

Yeah I think it makes sense to have both 👍

natebosch commented 1 year ago

This would allow for conditional imports

Conditional imports would continue to be supported only for dart.library.x expressions - arbitrary config from any place, including the pubspec, will not be used for conditional imports as they are implemented today.

jonasfj commented 1 year ago

Conditional imports would continue to be supported only for dart.library.x expressions

Oh, that makes me sad. We could supposedly change that some day, but I understand of there is a lot of arguments for not giving away that footgun. In practice if (const bool.fromEnvironment('...')) would allow the same kind of tree-shaking / customization.

jonasfj commented 1 year ago

@dcharkes I was surprised that you wanted configuration from other sources than the root-package.

For replacing top-level config files like ..._options.yaml, it'd only really make sense to consider the root-package. But then again, such tools are operating on the root-package, so that might be fine. And in a mono-repository with a single resolution they might consider a different package as root depending on current working directory.

We could do something like:

# pubspec.yaml package:foo
name: foo
config:
  # config for package:localization
  localization:
    messages:
      - data/messages.json

# pubspec.yaml package:bar
name: bar
config:
  # config for package:localization
  localization:
    messages:
      - data/messages.json

{
  "configVersion": 2,
  "packages": [
    {
      "name": "localization",
      "rootUri": "$PUB_CACHE/...",
      "packageUri": "lib/",
      "languageVersion": "3.0"
      "configFromPackages": [
        {
          "package": "foo",
          "config": {
            // So long as the reader knows that this is config from "package:foo"
            // a relative path can be interpreted relative to "rootUri" for "package:foo".
            // Isolate.resolvePackageUri already provides this logic.
            "messages": ["data/messages.json"]
          }
        },
        {
          "package": "bar",
          "config": {"messages": ["data/messages.json"]}
        },
      ],
      ...
    }
  ]
}

This way it's only necessary to rebuild localization if configFromPackages have changed. If config unrelated to localization changed, then a rebuild is not necessary. I don't think build.dart has to read package_config.json, we can just decide that configFromPackages is copied into its input file.

Then we leave for the build scripts in localization to decide how to merge configFromPackages. A tool like analyzer or dartfmt would probably just choose to read the config relevant for the package it is operating on (root-package).

This would probably make it hard to use .fromEnvironment to consume this configuration (like in my example where the root-package can configure the global default number of retries for package:retry). But one could probably just make some data-assets, like what I'm assuming that localization would do.

dcharkes commented 1 year ago

In a discussion with @mkustermann today we discussed we might actually want dynamic input in some cases (based on the build in build.dart) for tree shaking. So maybe we should not unify this type of config with input for tree shaking. If we start using the config proposed in this issue now, it could bite us later when we try to support more use cases with build.dart and tree shaking. Some notes: https://github.com/dart-lang/native/issues/153#issuecomment-1757360449

lrhn commented 1 year ago

There is very little here about who would consume this config, how and when.

There seem to at least two related ideas:

A package exposing "data" to client packages (those which depend on the package), like a side-channel for extra information.
A client package exposing "data" specifically to its dependency, say a configuration for a builder or generator. Like a way for one or more packages to put metadata in a central place, where the dependency can find them, without knowing that they exist.

The former could just be stored in lib/cfg/myname.json of the package myname, then anyone who cares can find the lib/ in package_config.json and check for the file.

The latter could be put in, say, lib/cfg/localization.json of each package that wants to tell package:localization something, then the tool from package:localization can read the package_config.json to find all the lib/ dirs, and check each for a cfg/localization.json, or whatever name it wants to use, as long as it starts with the package name. That would also leaves any merging to the tool that understands the data, instead of trying to do it abstractly in Pub.

I guess that's my idea: Recommend that metadata is stored in lib/cfg/ with a name starting with the package name of the package who's defined the format, in whatever format or formats is desired. There can be multiple files for each package name, even subdirs, as long as they all start with clearly ended package name.

Any tool which knows what to look for can find the lib/ dirs using package_config.json and check for the cfg/pacakgeName.* that it wants.

Anything you can do by reading package_config.json to directly get the data, can be done this way too, by reading package_config.json to find the location of the data.

The data doesn't even have to be YAML- and JSON-embedable, or limited in size. And we don't have to wait for the front-end to parse or skip data that is completely irrelevant to it, before it can start doing actual work on the package.

If putting it in lib/cfg/ is a problem them $ROOT/cfg can also work, if we are sure it's always available.

(I can write you a package to help find the cfg dirs and files in them, if that's a problem.)

jakemac53 commented 1 year ago

I definitely agree that this doesn't unlock any new feature - but what it would do is provide a single canonical way of doing configuration. It also wouldn't require each package that wants to support said configuration to do redundant searching of all transitive packages for said configuration.

So that is the value this feature would provide - a consistent convention and de-duplication of work.

jonasfj commented 1 year ago

The latter could be put in, say, lib/cfg/localization.json of each package that wants to tell package:localization something, then the tool from package:localization can read the package_config.json to find all the lib/ dirs, and check each for a cfg/localization.json, or whatever name it wants to use, as long as it starts with the package name. That would also leaves any merging to the tool that understands the data, instead of trying to do it abstractly in Pub.

This is actually a very solid approach, indeed this is what package:extension_discovery does :rofl:

The only limitation to this that a build process wouldn't be able to know if the configuration changed, and thus, whether to rebuild. One can also argue that it's easier (less duplication of work).

But if the native builds are taking a different direction with respect to user-defines, not sure I fully grasp it -- then maybe this is less important.

dart-lang / pub