Outside process generating outputs + other confusions with restrictions on outputs, build step dependencies, etc.

tgrushka commented 8 months ago

Hi, I'm pulling my hair out with build, build_runner, and build_config.

This is more than just a usage type question. The issues are: need better documentation, better workarounds for output restrictions, and better handling of subprocess outputs. We all depend on this build system, and since full-stack Dart projects depend more and more on code generation, it should be really straightforward and easy to figure out how to configure and use this system, and really hard to get it wrong. Things like generating protocol buffers from Dart classes (round trip generation from Dart to .proto and back to Dart to source folder in the builders) and generating code for multiple packages (client and server) should be supported without having to "cheat" to work around the output restrictions and "magic" build watch source graph detection of changed files.

What caused me to raise this larger issue of confusion / needing better documentation / legitimate workarounds for output restrictions / better handling of external process outputs:

I want to run protoc to generate .pb.dart, .pbenum.dart, etc. files from .proto files that my better_proto_mapper fork of proto_generator / proto_annotations packages generates from Dart classes. (Because writing original classes in Dart is the only way to go when one wants control over one's base model classes, not generated ones!) Unfortunately, it seems I must run protoc externally as the protoc_plugin (and my forked better_protoc_plugin) does not have Dart bindings for generation and is only designed as a plugin for protoc. And I just don't have the wherewithal to figure out how to run these .proto generators directly in Dart to generate Strings without going through the external protoc process. Running protoc inside a Builder is very problematic, because the build system / watcher does not detect its outputs, causing build to rerun the earlier step, which of course then detects no changes and stops it from repeating, but really the watcher should not repeat any steps. The limitations in Builder just make things really inelegant. I've tried:

Glob exclusions on buildStep.findAssets(Glob... (impossible to have exclusions in current Glob version);
Skipping the .pb.dart, generated/** files etc. as I read them in findAssets for loop ... build_runner still runs the step even if they all continue;
Having protoc generate to cache (.dart_tool/build/generated...) - no documented way to read back these files, since build_runner is not made aware of them;
If I add another "copy from cache" build step, it does not run when watch rebuilds, even if the protoc step has it in apply_builders. Maybe a separate issue as this one is getting way too long, but still part of the confusion, because the way build detects what steps need to be run seems like elusive magic.

better_proto_generator flows like this:

graph TD;
A[annotated Dart models] -->|generate| B[.tproto file in cache];
B -->|copy| C[.proto file in source];
C -->|run| D[protoc with outputs to project_root/generated/grpc];
D -->|writeToString| E[copy to lib/src/generated/grpc];

I wound up with the monstrosity below, which violates the practice of having a constant buildExtensions getter in the Builder and is a horrible workaround that may break in the future. Why doesn't buildExtensions just allow { r'$lib$': ['*'] } to output to any file, or at least { r'$lib$': ['src/generated/*.dart'] }? I return this at first before the directory is created, so the build script is generated without throwing, then when it finds the generated directory, it iterates all the files and adds them to allowed outputs.

I have read the available documentation including the FAQ tidbits in https://github.com/dart-lang/build/tree/master/docs.

I found google/protobuf.dart#158 and #3156 issues relevant.

I am having difficulty understanding all the output restrictions, "cyclic dependencies" on watch when generating outputs from a subprocess (prevent earlier step from re-running, because I can't have exclusions in buildStep.findAssets(Glob..., how to work around these issues without resorting to terrible antipatterns, etc.

I kind of understand the '$lib$' (and to lesser extent, '$package$') "magic" tokens but even these are overly restrictive. For example, I cannot Glob the .dart_tool/build/generated cache directory when collecting inputs with buildStep.findAssets(.... It seems "magic" that build keeps track of what it generates to cache, so I can't use an external process like protoc to generate to cache then copy these files to source. And there is no API for finding cached assets... why, when this is such a major part of the build process?

In a different project, I'm trying to generate both server and client code with a builder, and output these to two separate packages. Because there's absolutely no way to do this with build, I have to resort to an illegitimate antipattern of just force-copying the files in the Builder. Perhaps I'll raise this in a different issue later, but I mention it here because it relates to the seemingly extreme restrictions placed on build outputs, which I realize is a safety measure, but it's very confusing to understand how the Dart team wants us to legitimately work around these issues when necessary... and it sometimes is.

I'm trying to generate .proto and Dart protobuf classes from Dart classes annotated with @Proto() annotations. I've forked the proto_generator / proto_annotations packages to work with my better_protoc_plugin fork of protoc_plugin, but the principal underlying issue is the same ... it is with builder, not the plugins or generation.

See my ProtocRunner Builder code below.

Dart SDK Version: 3.3.0-174.3.beta
What package(s) from this repo you are using, and the version (i.e. build_runner 0.7.12)
- build: 2.4.1
- build_config: 1.1.1
- build_runner: 2.4.7
- build_runner_core: 7.2.11
- build_daemon: 4.0.1
- build_resolvers: 2.4.2
- build_verify: 3.1.0
What builder(s) you are using (or writing yourself). Try to give a short summary of what they do.

better_proto_mapper, my fork of proto_generator / proto_annotations packages

Whether you are using Windows, MacOSX, or Linux (if applicable)

macOS Sonoma 14.1.2 / MacBook Pro M1 Max

You can probably see why I'm so confused from several things I'm doing here.

import 'dart:io';
import 'dart:isolate';

import 'package:build/build.dart';
import 'package:glob/glob.dart';
import 'package:path/path.dart' as path;

/// Generating directly to source causes watched build to re-run
/// because outside process (protoc) is generating the files.
/// Therefore, have protoc generate outside lib, then copy.
/// Wish we could access the cache dir at .dart_tool/build/generated...
/// https://github.com/google/protobuf.dart/issues/158
///
/// Add predefined copy builder to package:build?
/// Crazy there is no way to Glob over cache and copy from there!
/// Discusses .proto / .pb.dart files as example.
/// https://github.com/dart-lang/build/issues/3156
///
class ProtocRunner implements Builder {
  ProtocRunner();

  /// This is really, really bad practice and warned against in the docs, but it seems to work for now!
  /// But if we don't generate to temp, and just let `protoc` generate directly to lib/src/generated,
  /// build_runner thinks files have changed and re-runs protoBuilder step, also not what we want.
  @override
  Map<String, List<String>> get buildExtensions {
    final dir = Directory('generated');
    if (!dir.existsSync()) {
      return {
        r'$lib$': ['*']
      };
    }
    final files = dir
        .listSync(recursive: true)
        .where((entity) => entity is File && entity.path.endsWith('.dart'))
        .map((file) => 'src/${file.path}');
    return {
      r'$lib$': ['*', ...files]
    };
  }

  @override
  Future<void> build(BuildStep buildStep) async {
    final List<AssetId> protos = [];

    await for (final input in buildStep.findAssets(Glob('**/*.proto'))) {
      protos.add(input);
    }

    if (protos.isEmpty) {
      throw Exception('Could not find any .proto files.');
    }

    /// We must read at least one .proto file to cause build_runner
    /// to re-run this step on each build during watch.
    await buildStep.readAsString(protos.first);

    /// Resolve better_protoc_plugin, which is a dependency of this package.
    final betterProtocUri = await Isolate.resolvePackageUri(
        Uri.parse('package:better_protoc_plugin/protoc.dart'));

    if (betterProtocUri == null) {
      throw Exception('Could not find better_protoc_plugin package!');
    }

    final pluginPath = path.normalize(path.join(
        path.dirname(betterProtocUri.path), '..', 'bin', 'protoc-gen-dart'));

    final cacheDir = 'generated/grpc';
    await Directory(cacheDir).create(recursive: true);
    final protoPaths = protos.map((p) => path.dirname(p.path)).toSet();

    final args = <String>[
      '--plugin=$pluginPath',
      '--dart_out=better_protos,grpc:$cacheDir',
      for (final protoPath in protoPaths) ...[
        '-I$protoPath',
        ...protos
            .where((p) => p.path.startsWith('$protoPath/'))
            .map((p) => p.path),
      ],
      'google/protobuf/timestamp.proto',
      'google/protobuf/duration.proto',
    ];

    final result = await Process.run('protoc', args,
        workingDirectory: Directory.current.path);

    print('Ran ProtocRunner with protoc exit code: ${result.exitCode}');

    if (result.exitCode != 0) {
      throw Exception(result.stderr);
    }

    final generatedDir = Directory('generated');
    await for (var entity in generatedDir.list(recursive: true)) {
      if (entity is File && entity.path.endsWith('.dart')) {
        final relativePath = entity.path.substring(generatedDir.path.length);
        final libPath = 'lib/src/generated$relativePath';
        final content = await entity.readAsString();
        final assetId = AssetId(buildStep.inputId.package, libPath);

        /// Since we're copying the files with `buildStep.writeAsString`, somehow
        /// they magically get into the build graph and are not counted as changed
        /// files by `watch`. But who knows how this actually works?
        await buildStep.writeAsString(assetId, content);
      }
    }
  }
}

Here is my build.yaml:

# Read about `build.yaml` at https://pub.dev/packages/build_config
builders:

  protoBuilder:
    import: "package:better_proto_generator/better_proto_generator.dart"
    builder_factories: ["protoBuilder", "protoMapperBuilder"]
    build_extensions: { "$lib$": [".tproto"], ".dart": ["proto_mapper.g.part"] }
    build_to: cache
    auto_apply: dependents
    runs_before: [":protoFromCache"]
    applies_builders: [":protoFromCache"]

  protoFromCache:
    import: "package:better_proto_generator/better_proto_generator.dart"
    builder_factories: ["protoFromCache"]
    build_extensions: {".tproto": [".proto"]}
    build_to: source
    required_inputs: [".tproto"]
    runs_before: [":protocRunner"]
    applies_builders: [":protocRunner"]

  protocRunner:
    import: "package:better_proto_generator/better_proto_generator.dart"
    builder_factories: ["protocRunner"]
    build_extensions: { "$lib$": ["*"] }
    build_to: source
    required_inputs: [".proto"]
    auto_apply: dependents
    runs_before: ["source_gen:combining_builder"]
    applies_builders: ["source_gen:combining_builder"]

jakemac53 commented 8 months ago

Dependencies of build steps are tracked through imperative calls to buildStep.writeAs* and buildStep.readAs*. Every input/output must be read/written through these apis.

As you have discovered, this does not play very nicely with external executables. In order to use these, the recommended pattern is using the scratch_space package. You will have to know exactly all the inputs that the executable might need to read, prepare a "scratch space" with those inputs, run the executable in that "scratch space", and then copy the outputs back out (the package has APIs for this part too).

This is similar to what you manually did, but it does it using temp directories, and in a way that is much more reliable and makes it harder to accidentally read files without the system knowing about it.

Why doesn't buildExtensions just allow { r'$lib$': ['*'] } to output to any file, or at least { r'$lib$': ['src/generated/*.dart'] }?

We require all outputs to be known ahead of time, before any building actually happens. This is partly to maintain compatibility with other build systems (bazel) and partly to allow lazy building of assets. If some later builder (or user) asks to build any given file, we always know exactly which builder we need to run, with what primary inputs, to produce that file.

jakemac53 commented 8 months ago

As far as the build extensions go, you probably need to have the build step run on only a single proto file at a time, so that you can declare the outputs properly. That requires knowing the proto file dependencies, but you could start by just providing all proto files as dependencies, to avoid needing to write some sort of proto file parser. Or possibly a simple regex based parser could work well enough, if you wanted to get more granular.

github-actions[bot] commented 7 months ago

Without additional information we're not able to resolve this issue. Feel free to add more info or respond to any questions above and we can reopen the case. Thanks for your contribution!

dart-lang / build

Outside process generating outputs + other confusions with restrictions on outputs, build step dependencies, etc. #3635