investigate tooling scalability in the presence of barrel files

mraleph commented 2 years ago

Barrel files are files which repackage multiple libraries into a single library by exporting them. https://github.com/flutter/flutter/blob/master/packages/flutter/lib/material.dart is a good example of a barrel file. Developers usually introduce barrel files to streamline their import management.

Barrel files have an obvious draw back they introduce false dependency edges between libraries. Consider for example the code below:

// a.dart
library a;

// b.dart
library b;

// c.dart
library c;

export 'a.dart';
export 'b.dart';

// d.dart
library d;

import 'c.dart';

// only use things from a.dart

In this situation when developer changes b.dart an incremental tooling which simply naively follows import edges will end up invalidating both c.dart and d.dart.

In the graph with many libraries this can have disastrous cascading consequences.

There is some evidence coming from users that eradicating barrel files considerably reduce reanalysis times for them. It would be nice to analyze this problem and understand what we can do to mitigate it.

This is both analyzer and CFE issue.

mraleph commented 2 years ago

/cc @jensjoha @scheglov

Moammar1498 commented 7 months ago

Any update on this please @mraleph

jellynoone commented 3 months ago

I would like to add that personally, I've been observing this issue for quite some time and that there is a direct link between using barrel files and performance.

Context: I'm working on a project which is fairly compartmentalised. The structure of the project can roughly be described as:

- feature1/
  - subfeature/
    - file.dart
  - subfeature.dart - exports all from subfeature/*
- feature2/...

Other features export another's feature export file.

This works really well for organising code and initially when the project was small the analyzer performance was great.

Now after a few 1000s of files later, the performance can get quite bad (+10s for completions) when editing files that are imported by many other files.

Sometimes, when I'm editing a file that is directly or transitively used by many other libraries, the performance gets so bad, I just uncomment it's export definition from the barrel file, and the analyzer's responses are lightning fast again.

jellynoone commented 3 months ago

This issue has got me thinking...

Would it be possible for the analyzer to be made smart enough to detect boundaries of effect and only analyze the code that would be directly affected by a given change?

For instance, suppose we have:

class A {
  void doB() {
    print('C');
  }
}

If I edit the body of doB for instance changeprint('C'); to print('A');, the analyzer should only have to reanalyze the body of doB since nothing outside the body is affected by the change.

Similarly, if I edit a private class, even add methods to it, the only code affected would be that of the library.

@scheglov Do you perhaps know, does the analyzer do any of this?

scheglov commented 3 months ago

When you edit just a method body, the analyzer:

Discards LibraryElements for this library, and any library that depends on it (might be this 1000 files in a single library cycle). This is fast (breaking is faster than building).
Computes the API signature of the file. Because we postulated that only the method body changes, we know that the API signature is the same in this case.
We know that we need to analyze the changed file (with this method body). We load LibraryElement for it. Because all API signatures are the same, we reuse cached serialized LibraryElement (the summary). But deserializing costs something. This has low cost, we load mostly just header to know which libraries we have in the summary.
We analyze the whole modified file. During analysis we lazily create elements from summaries. This usually has the medium cost, we analyze just one file, presumably not too big, and actually access not too many libraries from the summaries. But we can easily construct a high cost example.
Other files are not reanalyzed, because the API signatures are the same, and so combined with the code signatures of these other files, we see that diagnostics would be the same, so we just skip.
For completion we actually need to deserialize all elements. This might be high cost.

We don't have ability to analyze individual method bodies, only whole files. Although we do this for completion, but in these cases we skip computing diagnostics.

But when you edit a class, even private, then API signatures do change, and this causes a lot of work to relink all libraries, and reanalyze all files that depend on the changed file.

Optimizations are not impossible, but not trivial to implement and keep robust. I suspect that this will require more effort than we can afford right now.

jellynoone commented 3 months ago

Thank you for the explanation.

jellynoone commented 3 months ago

@scheglov If it helps with diagnosing the root cause, I was able to write up a reproducer for this issue.

// generate.dart

import 'dart:io';

const _core = 'core.dart';

final _folders = <String>[
  ...'abcdefghijklmnop'.split(''),
];
final _subfolders = [
  ...'qrstuvwxyz'.split(''),
];
final _filenames = [
  for (var i = 0; i < 10; i++) 'f$i',
];

void main(List<String> args) async {
  for (var folder in _folders) {
    final batch = <Future<void>>[];
    for (var subfolder in _subfolders) {
      for (var file in _filenames) {
        batch.add(putFile(folder, subfolder, file));
      }
      batch.add(_writeToFile(_rooted([folder, subfolder]) + '.dart', (sink) {
        sink.writeExportFileContents(libraries: _filenames.map((folder) {
          return '$subfolder/$folder.dart';
        }));
      }));
    }
    batch.add(_writeToFile(_rooted(['$folder.dart']), (sink) {
      sink.writeExportFileContents(libraries: _subfolders.map((subfolder) {
        return '$folder/$subfolder.dart';
      }));
    }));
    await Future.wait(batch);
  }

  await _writeToFile(_rooted([_core]), (sink) {
    sink.writeCoreFileContents(folders: _folders);
  });
}

Future<void> putFile(String folder, String subfolder, String file) async {
  final baseClass = '${folder.ucFirst}${subfolder.ucFirst}${file.ucFirst}';
  final parts = <String, String>{
    'part1': 'Impl1 implements ${baseClass}Source',
    'part2': 'Impl2 implements ${baseClass}Source',
    'part3': 'Impl3 implements ${baseClass}Source',
  };
  final fileBase = _rooted([folder, subfolder, file]);

  final lib = '$file.dart';
  await Future.wait([
    _writeToFile(fileBase + '.dart', (write) {
      write.writeFileContents(
        baseClass: baseClass,
        filename: file,
        parts: parts.keys,
      );
    }),
    for (var MapEntry(key: partName, value: suffix) in parts.entries)
      _writeToFile(fileBase + '.part.$partName.dart', (write) {
        write.writePartFileContents(
          library: lib,
          className: '$baseClass$suffix',
        );
      }),
  ]);
}

Future<void> _writeToFile(
  String filePath,
  void Function(StringSink sink) write,
) async {
  final file = File(filePath);
  if (!await file.parent.exists()) await file.parent.create(recursive: true);

  final sink = file.openWrite();
  write(sink);
  await sink.flush();
  await sink.close();
}

extension on StringSink {
  void writeFileContents({
    required String baseClass,
    required String filename,
    required Iterable<String> parts,
  }) {
    writeln('import "../../core.dart";');
    writeln();
    for (var partName in parts) {
      writeln('part "$filename.part.$partName.dart";');
    }
    writeln();
    writeln('class $baseClass {');
    writeln('  static const useOfCore = Core.hello;');
    writeln('}');
    writeln('class ${baseClass}Source implements $baseClass {}');
  }

  void writePartFileContents({
    required String library,
    required String className,
  }) {
    writeln('part of "$library";');
    writeln();
    writeln('class $className {}');
  }

  void writeExportFileContents({
    required Iterable<String> libraries,
  }) {
    for (var file in libraries) {
      writeln('export "$file";');
    }
  }

  void writeCoreFileContents({
    required List<String> folders,
  }) {
    for (var folder in folders) {
      writeln('export "$folder.dart";');
    }
    writeln('class Core {');
    writeln('  static const hello = "Hello, World!";');
    writeln('}');
  }
}

String _rooted(List<String> parts) {
  return (['lib']).followedBy(parts).join(Platform.pathSeparator);
}

extension on String {
  String get ucFirst {
    return this[0].toUpperCase() + this.substring(1);
  }
}

Steps:

Copy the provided code to a new project root, for example, name the file generate.dart
Run dart run generate.dart from the root of the project. (This will generate the necessary source code)
Open lib/a/z/f0.dart

Try changing

class AZF0 {
static const useOfCore = Core.hello;
}

To

class AZF0 {
static const useOfCore = Core.hello + Core.hello; // or AZF3.useOfCore
}

Observe code completion being slow.
Go to lib/a/z.dart
Comment out: export "z/f0.dart";
Go back to lib/a/z/f0.dart and attempt to perform the same edits as before.
Observe completions being fast again.

While this example is contrived, the dependencies between different libraries and the general structure closely resemble that of the real-life project I'm working on.

One other note I'd like to add is that in this example (and our live project) we have a circular dependency. There is a core.dart file which exports all files and is subsequently used by the leaf libraries. This might seem like it wouldn't happen in real world.

However, in our project, we have a genuine situation like this. We have code generators that pull specific configurations from different parts of the project and collect it in one library. That library is then also used by other parts of the app. Which probably leads to a certain file transitively importing most of the application simply because it uses the joint configuration.

I hope this helps.

Environment: dart --version

Dart SDK version: 3.5.0 (stable) (Tue Jul 30 02:17:59 2024 -0700) on "macos_arm64"
Freshly deleted ~/.dartServer/.analysis-driver/

Editor

Version: 1.93.0-insider
Commit: 4693ac3d4dce852b747d08d4bc54db5e35afe268

Device, OS

MacBook Air M2 2022, Sonoma 14.6.1

scheglov commented 3 months ago

Thank you for the code that generates the test case.

The reason we don't fix this is not that we don't know about the issue. But implementing incremental resolution will require more effort than we can afford right now.

jellynoone commented 3 months ago

OK. So, this isn't a case of:

The analyzer has a small bug in resolving libraries., but rather
The way the analyzer resolves libraries is suboptimal.,

The thing about the test case that made it seem like there could be a simple answer is the fact if I removed export "z/f0.dart"; (the file I was editing), the performance was restored.

I hope at least the additional test case helps.

Sorry for the noise.

scheglov commented 3 months ago

Yes, the way the analyzer works is not performant in this case.

The analyzer builds element models for Dart libraries. But it does it for whole library cycles, where a single library cycles consists of libraries that import or export each other. And when you change any file in this library cycle, the whole cycle should be linked.

When you remove export, this probably moves z/f0.dart out of the library cycle, so editing it does not invalidate the (large) library cycle, and so it is much faster.

jellynoone commented 3 months ago

The generated files look like this:

// lib/core.dart

export "a.dart";
export "b.dart";
export "c.dart";
export "d.dart";
export "e.dart";
export "f.dart";
export "g.dart";
export "h.dart";
export "i.dart";
export "j.dart";
export "k.dart";
export "l.dart";
export "m.dart";
export "n.dart";
export "o.dart";
export "p.dart";
class Core {
  static const hello = "Hello, World!";
}

// lib/a.dart
export "a/q.dart";
export "a/r.dart";
export "a/s.dart";
export "a/t.dart";
export "a/u.dart";
export "a/v.dart";
export "a/w.dart";
export "a/x.dart";
export "a/y.dart";
export "a/z.dart";

// lib/a/z.dart

export "z/f0.dart";
export "z/f1.dart";
export "z/f2.dart";
export "z/f3.dart";
export "z/f4.dart";
export "z/f5.dart";
export "z/f6.dart";
export "z/f7.dart";
export "z/f8.dart";
export "z/f9.dart";

// lib/a/z/f0.dart
import "../../core.dart";

part "f0.part.part1.dart";
part "f0.part.part2.dart";
part "f0.part.part3.dart";

class AZF0 {
  static const useOfCore = Core.hello;
}
class AZF0Source implements AZF0 {}

Wouldn't removing z/f0.dart from lib/a/z.dart still cause a computation of a large library cycle? The file lib/a/z/f0.dart still imports lib/core.dart and by extension all other libraries.

jakemac53 commented 3 months ago

Wouldn't removing z/f0.dart from lib/a/z.dart still cause a computation of a large library cycle?

It makes z/f0.dart no longer a (transitive) dependency of the core.dart file, which removes it from the cycle.

The issue really is not barrel files themselves, it is barrel files being imported by the libraries they export. That creates a cycle, because now every library depends on every other library (transitively, through core.dart exports).

If you removed the core.dart import from the top of the generated z/fx.dart files, it would also fix the issue.

scheglov commented 3 months ago

Removing export of z/f0.dart will cause computation of a large library cycle, once.

This will cost you something, but at least after that typing in z/f0.dart during an attempt to get code completion will not immediately invalidate this same library cycle and cause computation during code completion.

jellynoone commented 3 months ago

Removing export of z/f0.dart will cause computation of a large library cycle, once.

This will cost you something, but at least after that typing in z/f0.dart during an attempt to get code completion will not immediately invalidate this same library cycle and cause computation during code completion.

After my previous comment, I realised what you were probably communicating to me was that once z/f0.dart was removed no other library was using it and as such edits to it didn't cause reanalysis of other libraries.

jellynoone commented 3 months ago

Thank you for putting up with me.

I just have one other idea, which isn't necessarily related to this issue but might help ease it.

What if the analyzer had an option to only cause reanalysis of depending libraries when a given library is free of syntax errors?

If I understand correctly, while I'm editing a file the analyzer only needs to know about the definitions of other files I use in this file in order to provide me with completions and other language features.

The analyzer only needs to reanalyze other files in order to prove their correctness as a result of changes I made to this file i.e. the depending files.

The annoying part of this issue isn't that analysis is slow, but rather that editing is.

For instance, if I want to write a piece of code, most of the time writing it, the code will be in invalid state (have syntax or analysis errors). Due to this issue, the entire writing process will be slow.

However, if full analysis happened only after I completed the piece of code I wanted to write, which is presumably when it is free of syntax errors (or if we want to be more extreme all analysis errors), then I wouldn't mind the wait. I probably wouldn't even notice, since I'd be thinking what to write next.

Personally, I would probably use something like this even if this issue didn't exist. I'd love it if I could configure the analyzer to perform shallow analysis while editing a file and on file save perform a project-wide / deep analysis of how my edits affected other parts.

However, I don't know just how difficult something like this would be to implement. And if it is generally useful enough. And if the implementation is actually the incremental compilation being talked about in this issue.

jellynoone commented 3 months ago

To summarise: Currently, when editing a library two types of analysis is going on:

Analysis of libraries the current library uses (this operation is fast)
Analysis of libraries that use the current library (this operation may be slow depending on how many libraries use it)

Analysis 1. provides the information required for completions and correctness checks in the library the user is writing. Analysis 2. informs the user of general program correctness.

However, only analysis 1. actually helps the developer when writing out a piece of library code. Analysis 2. ensures the entire program is correct but while editing, the developer might not care about this as much.

For instance, if I'm modifying a class, I'm not really thinking about all the places this class is being used. I'm just focused on writing out the code I need to write. And only after writing everything out, I'd like to know how my changes affect the program as a whole.

If this summary is correct (both technical and what developers value), it seems, the analyzer might be doing a lot of work for results that don't really help the programmer write new code. (If I'm editing a file and not looking at all the errors being highlighted in other files, I'm not leveraging the analyzer's full potential.)

So if we could delay analysis 2. until the user finishes, we might be able to make the editing as a whole smoother, even if the actual project wide analysis is still suboptimal for cases of large library cycles.

scheglov commented 3 months ago

You are right that when you edit a library, the performance of editing (completion in particular) should not depend on how many uses of this library there are.

I have a couple of comments about this.

The hypothesis that "Analysis of libraries the current library uses" is fast might be false. I can imagine that if the current library is a part of the large library cycle, then in order to get the element model for this library cycle, so that we can resolve the current library, we need to parse and link all the libraries of this library cycle. We don't resolve method bodies, but still it is not necessary fast.
It is possible that because of the way scheduling work inside the Dart analysis server is implemented, we can pick up another, non-priority operation, e.g. analysis of a user of the current library after we finished with processing the current library. The user in the IDE might type one character, get completion, we start something else but heavy, the user types again, we don't respond because we are busy. Dart has only single thread of execution. We have an idea to push all non-priority work into a separate isolate, but again it is all limited by the availability of hands to do it.

jellynoone commented 3 months ago

Based on comments from @scheglov and @jakemac53, I was able to refractor our code base such that analyzer performance was greatly improved. Thank you for giving me the missing pieces and helping me solve our problem.

The key was in breaking the self-referencing library cycle.

In our case, this was achieved by moving configurations that pulled in most of the application aside and inject them into the main program rather than statically reference them across the application. (This means we provide the configuration to the application in main(). This file is never used by other parts of the app so it's ok that it pulls in everything.)

We got to keep all the barrel files and the desired structure, we simply needed to shift some imports around.

Some libraries that many other libraries depend on still aren't as performant to edit as I'd like, however, the performance is miles better than what it was at the start of the refractor. (And in these cases I can still use the export comment out trick and the full performance is back again)

Just out of curiosity, are these analyzer limitations documented anywhere? I'd imagine if developers knew what patterns to avoid, they could leverage the analyzer much better.

jakemac53 commented 3 months ago

Note that this issue is much more exaggerated when editing a macro - auto completion in the macro itself is always scheduled behind the re-compilation and re-application of the macro. This results in a poor experience when editing macros, even if they are only imported/applied in a couple libraries.

I don't know how to fix this other then somehow changing the scheduler of tasks in the analysis server such that events could have some sort of priority - and slower tasks that already started could yield to those higher priority events if they come in. However, that is likely a very large amount of work.

bwilkerson commented 3 months ago

Note that this issue is much more exaggerated when editing a macro - auto completion in the macro itself is always scheduled behind the re-compilation and re-application of the macro.

I have to question how useful it is to a macro author to have the macro run while they're in the process of editing it. What's the motivation for that decision?

I don't know how to fix this other then somehow changing the scheduler of tasks in the analysis server such that events could have some sort of priority ...

Currently there isn't any explicit scheduler, the analysis server just processes requests as they arrive. This needs to change for other reasons, and the scheduler that we've designed ought to be able to handle at least part of this proposal.

The ability for lower priority tasks to yield to higher priority tasks is an interesting idea, but will also require some work on the "handlers" (the objects that perform the tasks). Fortunately, supporting some kind of yielding would be an incremental piece on top of the work we're already planning to do, so we can return to this issue after there's a scheduler in place.

@keertip

jakemac53 commented 3 months ago

I have to question how useful it is to a macro author to have the macro run while they're in the process of editing it. What's the motivation for that decision?

When making a macro it is very handy to have the augmentation open for some library where the macro is applied, so you can see the actual output changing as you update the macro.

I think the current behavior is mostly just the result of not doing anything special as opposed to an explicit decision.

bwilkerson commented 3 months ago

When making a macro it is very handy to have the augmentation open for some library where the macro is applied, so you can see the actual output changing as you update the macro.

Absolutely! But not when the code is incomplete and the results are known to not be interesting. Seems like it would be a better UX if there were a way to signal when it's useful to rerun the macro. I'm not sure what the right signal would be, but it sounds like compiling and executing the code less often might improve completion performance, which might produce a better UX overall.

jakemac53 commented 3 months ago

Quite possibly even just only re-compiling/applying macros on save, instead of for each character typed, would be enough. If that is feasible to do.

bwilkerson commented 3 months ago

We do get notifications when a file is saved, but many IDEs, including IntelliJ and VS Code, have an auto-save feature that makes 'saved' a less interesting signal.

jakemac53 commented 3 months ago

We do get notifications when a file is saved, but many IDEs, including IntelliJ and VS Code, have an auto-save feature that makes 'saved' a less interesting signal.

Depends on how long the delay is... honestly even a very short delay here would dramatically improve the situation, so that if you are actively typing on a line you would get a more instant response for auto complete and such. Once you stop typing you are less likely to notice if the analyzer is doing a bunch of other work.

It isn't a perfect solution to be sure, but should be better than the status quo.

bwilkerson commented 3 months ago

We could certainly try it and see whether it's enough of an improvement. (After we have the ability to schedule work, of course.)

dart-lang / sdk

investigate tooling scalability in the presence of barrel files #50369