x/tools/gopls: filter useless completion candidates

muirdm commented 4 years ago

I think we are at a point where we can consider filtering out completion candidates that are probably useless. We still have to be careful because it is normal for users to sometimes compose Go code "out of order", or use completion on an empty line to explore what is available, or otherwise try to complete something that isn't "quite right" yet. Also keep in mind that gopls having no expected type doesn't necessarily mean there is no expected type - there could be a syntax error we don't work around yet.

Here are some concrete situations to consider:

If we have an expected type, don't offer func candidates that return no values. This one seems pretty low risk. There are still edge case false negatives such as a user wanting to complete to a void function with the intention of then going and adding a return value.
If we have an expected type and it isn't bool, omit otherwise non-matching type name candidates that have no methods. This seems medium risk. For example:

// do offer float64 to allow type conversions like "float64(something) > 1234"
var _ bool = f<>

// don't offer float64 
var _ int = f<> 

type someType struct{}
func (someType) someMethod() int { return 0 }

// do offer someType to allow things like "someType{}.someMethod()"
var _ int = s<>

type otherType struct{}

// don't offer otherType
var _ int = o<>

Please comment with other low risk opportunities to omit candidates.

gopherbot commented 4 years ago

Thank you for filing a gopls issue! Please take a look at the Troubleshooting guide, and make sure that you have provided all of the relevant information here.

findleyr commented 4 years ago

I wonder if we can come up with (and document) principals/heuristics for making these types of decisions -- ideals that we should strive for, even if imperfectly. Put another way: perhaps we need a stricter definition of 'useless' :). Off the top of my head, here's one such set of principles:

P1: The initial list of completion candidates should include all expressions that are valid at point, assuming that we keep typing without jumping to another location.

P2: We should score completion candidates based on our estimate of the probability of a candidate being chosen given everything we know.

P3: We should filter based on some minimum probability threshold: if score is a number between 0 and 1 representing the fraction of time we think a completion candidate will be chosen, any completion with score below, say, 0.01 should be filtered out. If the user is only going to select it one out of 100 times, then it's not worth the distraction to show it.

I'll note that we do have a concept of lowScore now, but it seems not to be pinned to any intrinsic meaning, and I don't think we're actually filtering based on it. Correct me if I'm wrong.

These principles are inconsistent with using completion to explore what's available', which is why I think they're worth discussing.

If we use these as a guide, then I think what you propose makes perfect sense. I especially like the thought put into this:

var _ bool = f<>

That should have a lot of potential candidates, but IMO anything that's not a bool expression should be significantly downranked: if we were to naively analyze a typical codebase looking for the following forms:

(A) var _ bool = [true|false]
(B) var _ bool = [unary expression]
(C) var _ bool = [binary expression]

I think we'd find that (C) is pretty rare. If we have 1000 completion candidates that start an expression of form (C), I think most if not all of them would be filtered out based on P3 above.

Does that make sense? Do you agree or disagree with P1-3 above?

muirdm commented 4 years ago

I like the idea of using more rigorous/absolute scoring to filter poor candidates.

P1: The initial list of completion candidates should include all expressions that are valid at point

Keep in mind candidates are usually single-object expressions, so this should be "candidates that could be part of a valid expression". You have to use your imagination in a lot of situations to determine if a candidate could be the start of a valid expression:

// Candidate is "someSlice" which doesn't look appropriate, but user
// could want someSlice[idx].someBoolMethod()
var _ bool = s<>

In other words, it is hard to eliminate any candidate initially, but perhaps you weren't suggesting otherwise.

P2: We should score completion candidates based on our estimate of the probability of a candidate being chosen given everything we know.

Part of me is afraid we might never know enough to do a good job at this. Maybe a data driven statistical/ML approach could help fill in the void when we otherwise don't really know which of two candidates is more likely.

any completion with score below, say, 0.01 should be filtered out

I know you were just giving a number for discussion's sake, but I think 1/100 would be noticeable and mildly aggravating. If we try score based filtering, I imagine we will need to add a config knob so people can adjust the threshold based on how much they hate distracting candidates vs. occasional missing candidates.

For now I wanted to start with a much easier task of filtering candidates useless in 9999/10000 cases. I guess you could think of it as special case P2 scoring to set a candidates score straight to 0.00001 with a threshold of 0.0001.

An simpler but maybe-just-as-good approach is to continue focusing on only upranking the good candidates and then just limiting to 5 or 10 candidates total in the response. I'm curious how often users select completion candidates past the first five.

findleyr commented 4 years ago

Keep in mind candidates are usually single-object expressions, so this should be "candidates that could be part of a valid expression"

I think we're saying the same thing: any expression that, if I keep typing, could be valid. You mentioned completing to a function with void return, and then going to edit that function to change its return type -- I was explicitly trying to exclude those cases.

Part of me is afraid we might never know enough to do a good job at this. Maybe a data driven statistical/ML approach could help fill in the void when we otherwise don't really know which of two candidates is more likely.

Totally agree with that, but I think at least in principle pinning our score to some intrinsic measure allows us to calibrate score multipliers based on empirical measurement. For example consider forms (A), (B), and (C) above for completing var _ bool = <>. If we want to know how much to downrank expressions of form (C) (things that aren't of type bool but could be part of a binary boolean expression), we could actually look at typical source code to see how often this form of expression is actually used. If it is 10% of the time, then we can multiply all the naive scores of those completion items by 0.1.

I also think that some simple statistical calibration of a-priori scores for completion candidates is not that far-fetched. For example, if I am completing the status code in http.Error(w, msg, http.Status<>), it's not that hard to determine that http.StatusTeapot is rarely used, and that http.StatusOK and http.StatusInternalServerError are very common. That's already pretty helpful, I think. (if we're even more sophisticated, we might know that http.StatusOK is almost never used in this particular function call). Pinning scores to probability makes such statistical analysis easier to integrate.

I know you were just giving a number for discussion's sake, but I think 1/100 would be noticeable and mildly aggravating.

Thanks for pointing that out: I think you're right. 1/1000 or 1/10000 is probably better.

set a candidates score straight to 0.00001 with a threshold of 0.0001

BTW, I'm not even suggesting (yet) that we implement this scoring system, but it can still help make decisions about what to filter. You asked for low-risk opportunities for what to filter, and I wanted to start a conversation about what defines a 'useless' completion candidate. Don't mean to derail, perhaps I'll write up my thoughts in a separate issue.

An simpler but maybe-just-as-good approach is to continue focusing on only upranking the good candidates and then just limiting to 5 or 10 candidates total in the response. I'm curious how often users select completion candidates past the first five.

I'd also really love to see some data here. We could probably instrument the server to collect some useful information into a report, by looking for a didChange after a completion request.

But let's consider the case of completing http.Status<> again: I do feel like it's helpful to see all completions there -- it would be annoying to only get 5-10.

findleyr commented 1 year ago

I'd like to prioritize this. Having a relevancy cutoff can also help improve completion performance.

golang / go

x/tools/gopls: filter useless completion candidates #37718