dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.12k stars 1.57k forks source link

Regenerate relevance tables in analysis_server #48771

Open srawlins opened 2 years ago

srawlins commented 2 years ago

The relevance tables were generated before the prevalence of null safe code. We need to regenerate them, using null safe code, aspects of which may look quite different.

CC @bwilkerson

bwilkerson commented 2 years ago

Konstantin has suggested that we run an experiment to test whether relevance scores actually improve the ranking over using just the prefix matching score. If relevance scores aren't helping then we won't need to have these tables, so we might want to run that experiment first.

If we do need the tables, I'll note that the current tables were built using a small number of small Flutter apps, all of which are now quite old. We should think about the right mix of inputs in order to maximize the quality of the results for whatever kind of code a user might be writing.

The tables are built by tool/code_completion/relevance_table_generator.dart.

When we build new relevance tables we should test them against the current tables to see what the impact on the metrics is. There's support for that built in to code_metrics.dart. I'll be happy to walk someone through the process if it isn't me.

srawlins commented 2 years ago

We should think about the right mix of inputs in order to maximize the quality of the results for whatever kind of code a user might be writing.

This doesn't have to be super rigorous to get some good benefits, right? Like if I switched the inputs to be just the flutter/gallery app, and regenerate the tables, and run completion_metrics (over a corpus different from flutter/gallery, I think), and the metrics are generally better, then we can call that a day, right? A separate task might focus on increasing the inputs by an order of magnitude, and trying to maximize the gains by selecting different inputs.

bwilkerson commented 2 years ago

... over a corpus different from flutter/gallery, I think ...

Yes. You can't evaluate the quality of the relevance tables by running them over the same code used to build them for the same reasons you can't test an ML model by running it over the training data.

I ran an experiment similar to what you're describing a while back (maybe a year ago? my sense of time has been thrown off by the pandemic) and what I found is that the relevance tables didn't really change enough to show any improvement over what we currently have. I don't know whether the same will be true now. Let me know if you want details.

srawlins commented 2 years ago

That's very interesting. The one reason I filed this task is the pre-null safety / post-null safety change which has taken place. I think it'd be worth investigating, if we make a push to improve completion quality.

All that being said, I have no intention of investigating completion quality any time soon; my current charge is strictly for performance.

bwilkerson commented 2 years ago

... the pre-null safety / post-null safety change ...

I don't think null-safety had as big an impact on completion suggestions as we might expect. We do have a feature that looks at types, and the types of expression has changed, but that's not part of the content of these tables.

That said, I'd forgotten before but one aspect of this, which will continue to be an issue into the future, is that newer language features that didn't exist when the tables were last built aren't represented in the tables at all. That by itself is a good reason for rebuilding them.