Closed DanTup closed 6 years ago
Any update on this?
Sorry, but no, not yet.
Gentle ping on this. Also cc @devoncarew, curious how the Intellij plugin's 'search everything' is implemented since it's mostly instantaneous? Does it use a different API? Perhaps it caches?
I believe findTopLevelDeclarations
was implemented for the Eclipse client (since end-of-life'd). IntelliJ does some of its own parsing and indexing (due to the way that it provides support for multiple languages), so may not use this call. VSCode may be the only current customer of the API. In other words:
eg. in the flutter repo, I get 75MB of JSON back for findTopLevelDeclarations which means a lot of deserialising, parsing, etc
it's quite likely that the API has regressed. @DanTup, how important is this for VSCode? Not clear that we have the bandwidth for a fix on the analysis server side currently, but if @bwilkerson or @scheglov have a sense for what the fix would be - or the area to investigate - we could at least break down and get a better sense of the work required.
@devoncarew I don't do enough Dart dev to know how often it's used there, but for my own work (in C# and in TS for Dart Code) it's a feature I use a lot - it powers the Find Symbol (Ctrl+T) feature.
I could try and take a look, though I don't know what the expected mechanism for not returning SDK results is. I could also try and track down where it was introduced, but if it was in the new driver work as @bwilkerson suspected in the related Dart Code case I guess all I'd find is that it was in the version that defaulted to using it.
If someone can give me some pointers, I could at least take a look and see what I can find (and maybe at least come up with some failing tests). That said; there is also still the possibility Dart Code is taking advantage of a bug so it might not be totally straight forward!
Hm... Somehow this issue was not brought to me earlier.
Several ideas about findTopLevelDeclarations
.
It was implemented as one of the very last features during porting Analyzer to Analysis Driver, exactly because Dart plugin for IDEA does not use it. I was not even sure that we need it, because there was no good way to make it fast enough. At the end we decided that slow implementation is better than no implementation (i.e. a breaking change in the protocol) at all. The reason for slowness is that resynthesize element model for each file (we do this lazily, but it is still some work), then filter all top-level declarations though the given regular expression, and send results.
As it is implemented now, we return top-level declarations for all known files. This includes all explicitly added files (i.e. files in the open projects), and the transitive closure of all files referenced from them - packages, SDK. Was it a mistake to return results for known files vs. explicitly added files? Did the older version return just results for explicit files? Do users want to open a Flutter or SDK class? Switching to only explicit files will greatly reduce the number of element models to resynthesize, as well as size of result.
Can you make an experiment and actually send request to Analysis Server only when the user typed 3+ characters, so that you get not as many results? Note, that performance still might be bad, because Analysis Server will resynthesize all element models, and filter only then. So, if it is still slow, we will know that the problem is in resynthesis, not it JSON encoding and transmission.
There is a similar feature in Analysis Server where Quick Fix adds a required import for an unresolved symbol. We also need to check all defined or exported top-level declarations. But we don't resynthesize actual elements, because we need only know which source files declare or export some name. This is what we call "unlinked" information - it is available in each individual file, without context of all files that it imports. So, it works much faster. Unfortunately this is not enough for existing findTopLevelDeclarations
API.
We could in theory provide a new API that returns less information, e.g. only this: file path, location, name, kind (type, function, or variable). This probably should be enough to show useful information for users and navigate to symbols. What is missing is types of variables (because var v = 42;
or var v = foo();
does not have this type as unlinked information), and function signatures (although we could probably add them).
We need to decide whether we want declarations from explicit or know files. Or it could be a flag on the request. But only if you actually going to use it :-)
Similarly, potentially we could also include documentation (it is also unlinked information), or make it optional. Again, only if you will use it, otherwise we can avoid deserializing this information and make implementation a bit faster.
In my testing with older SDKs, results from inside the SDK were not included. That's the behaviour I personally expected (since showing all the SDK just makes it harder to jump around my own code) but I'm not sure whether that's the same for a typical Dart user.
- Can you make an experiment and actually send request to Analysis Server only when the user typed 3+ characters, so that you get not as many results?
I made Dart Code only send the search when there are three chars and searched for ddd
in a tiny Dart sample app and it was really fast:
[21:42:27]: ==> {"id":"11","method":"search.findTopLevelDeclarations","params":{"pattern":".*[Dd].*[Dd].*[Dd].*"}}
[21:42:27]: ==> {"id":"12","method":"search.findMemberDeclarations","params":{"name":".*[Dd].*[Dd].*[Dd].*"}}
[21:42:27]: <== {"id":"11","result":{"id":"1"}}
[21:42:27]: <== {"id":"12","result":{"id":"2"}}
[21:42:27]: <== {"event":"search.results","params":{"id":"1","results"://SNIP
[21:42:27]: <== {"event":"search.results","params":{"id":"2","results":[],"isLast":true}}
So I removed the limit and did the same searching for just d
and it was also fast despite a large number of results:
[21:44:47]: ==> {"id":"8","method":"search.findTopLevelDeclarations","params":{"pattern":".*[Dd].*"}}
[21:44:47]: ==> {"id":"9","method":"search.findMemberDeclarations","params":{"name":".*[Dd].*"}}
[21:44:47]: <== {"id":"8","result":{"id":"2"}}
[21:44:47]: <== {"id":"9","result":{"id":"3"}}
[21:44:47]: <== {"event":"search.results","params":{"id":"2","results":[//SNIP
[21:44:47]: <== {"event":"search.results","params":{"id":"3","results":[],"isLast":true}}
So I tried the same in the Flutter stocks app and it was a little slower:
[21:46:45]: ==> {"id":"11","method":"search.findTopLevelDeclarations","params":{"pattern":".*[Dd].*"}}
[21:46:45]: ==> {"id":"12","method":"search.findMemberDeclarations","params":{"name":".*[Dd].*"}}
[21:46:45]: <== {"id":"11","result":{"id":"2"}}
[21:46:45]: <== {"id":"12","result":{"id":"3"}}
[21:46:47]: <== {"event":"search.results","params":{"id":"2","results":[//SNIP
[21:46:47]: <== {"event":"search.results","params":{"id":"3","results":[],"isLast":true}}
So I tried again with the whole Flutter repo open and it was painfully slow:
[21:50:37]: ==> {"id":"11","method":"search.findTopLevelDeclarations","params":{"pattern":".*[Dd].*"}}
[21:50:37]: ==> {"id":"12","method":"search.findMemberDeclarations","params":{"name":".*[Dd].*"}}
[21:50:37]: <== {"id":"11","result":{"id":"2"}}
[21:50:37]: <== {"id":"12","result":{"id":"3"}}
[21:51:04]: <== {"event":"search.results","params":{"id":"2","results":[//SNIP
[21:51:05]: <== {"event":"search.results","params":{"id":"3","results":[],"isLast":true}}
So then I put the 3-char min back in and searched for ddd
again and it was somewhere inbetween:
[21:52:51]: ==> {"id":"6","method":"search.findTopLevelDeclarations","params":{"pattern":".*[Dd].*[Dd].*[Dd].*"}}
[21:52:51]: ==> {"id":"7","method":"search.findMemberDeclarations","params":{"name":".*[Dd].*[Dd].*[Dd].*"}}
[21:52:51]: <== {"id":"6","result":{"id":"3"}}
[21:52:52]: <== {"id":"7","result":{"id":"4"}}
[21:53:13]: <== {"event":"search.results","params":{"id":"3","results":[//SNIP
[21:53:14]: <== {"event":"search.results","params":{"id":"4","results":[],"isLast":true}}
So, I think it's a combination of both searching more and more results/JSON (there was actually a noticable hang in Code's UI during the JSON deserialisation!). The flutter repo is significantly slower than anything else - but it may be orders of magnitude bigger in terms of code (flutter, all the example projects, all the packages used by them all, etc.).
We could in theory provide a new API that returns less information, e.g. only this: file path, location, name, kind (type, function, or variable). This probably should be enough to show useful information for users and navigate to symbols. What is missing is types of variables (because var v = 42; or var v = foo(); does not have this type as unlinked information), and function signatures (although we could probably add them).
Scanning through the code, I believe the info I currently use from the results in Code is:
result.location.startLine
result.location.startColumn
result.location.length
result.location.file
result.path[...].name
result.path[...].kind
result.part[...].parameters
It doesn't seem like we use the Type and looking at the results in the list, there's indeed no obvious difference in rendering between the results for var isInt = 1;
and var isString = "Danny";
.
We need to decide whether we want declarations from explicit or know files. Or it could be a flag on the request. But only if you actually going to use it :-)
Code doesn't have a built-in option that'd let the user choose this; but if we're unsure what's best for a user we could always have an option in Dart Code to allow people to opt-in to including the SDK/etc. if they want. Personally I think just searching my own code is most useful out of the two; but again, I might not be the best person to have an opinion due to limited Dart coding!
Oh, it's also worth noting that the search I want is actually everything, not just top-level. Currently (as you can see from the logs above), I'm sending both findTopLevelDeclarations
and findMemberDeclarations
and merging results (see discussion here). If you're going to add a new API, it might be worth considering one that can do the job of both (my goal is that the user can search for anything; wether it's a type, a property, a field, a method, whatever).
re: https://github.com/dart-lang/sdk/issues/29510#issuecomment-353803783 Bullet point 2
I think it's sensible to search among specified files only. IntelliJ's search everywhere mechanism explicitly highlights symbols found outside the workspace.
VSCode's UI which Dart Code connected findTopLevelDeclarations calls to is also called 'Go to Symbols in Workspace...' (the label comes from VSCode).
Including transitive dependencies would be a nice add-on as a toggle or as a separate API but I think it shouldn't impede the performance or the ability to search in user code.
Slow symbol search came up again at https://github.com/Dart-Code/Dart-Code/issues/568
A proposal for API. https://dart-review.googlesource.com/c/sdk/+/42105
Some comments:
containerName
: The name of the container - the file name for top-level declarations, the class name for class members.
Is it possible to keep these separate? TypeScript shows results like this:
This always includes the filename, even when the items are in a class/module (I think it's useful to always see the filename even when it's inside a class). If containerName
sometimes has a filename and sometimes has a class it means we have to put strange logic in here (eg. parse the filename from path, then check whether it's the same as containerName
and if so don't show it twice). I'd actually prefer to show more of the path than just the filename anyway.
I'm not sure whether we want ElementKind
Currently we use this to set the icon that's displayed. I don't know if you're suggesting dropping it or just having another enum, but we we can reuse the existing mapping we have so I think it's fine as that.
I don't think we will return local variables or parameters.
I agree, I don't think there's much value in them (and in Code, other languages don't seem to be providing them).
It would be probably still a lot of information, maybe we should use single string table in response?
Not entirely sure what that means?
It would be probably still a lot of information, maybe we should use single string table in response?
Not entirely sure what that means?
Similar to the way NavigationTarget
has a fileIndex
field that is an index into a list of file paths so that the file paths are not duplicated in the result.
Similar to the way
NavigationTarget
has afileIndex
field that is an index into a list of file paths so that the file paths are not duplicated in the result.
Aha, got it; makes sense!
Is there value in having an int maxResults
in the params so the client can reduce the results that come back (you can stop searching at maxResults + 1
and return just the first maxResults
along with a boolean indicating there were more)? On a large codebase some names may be really common so depending on the UI to show the results (or whether things like json deserialisation are being done on the UI thread and being slow might be a problem) we can truncate them and show a message informing the user?
Hmmm, I'm a bit confused - the search pattern was removed? Weren't we trying to minimise the data coming back, but this must be returning everything for the entire project?
In Code, if the user backspaces we do get fired again, so my preference would definitely be to be able to provide the pattern and get a small set of results than be receiving an enormous list (which is kinda what the current problem is).
I created this CL, https://dart-review.googlesource.com/c/sdk/+/42180, but I will postpone it for now.
I can see how pattern
can work with maxResults
, if we set maxResults
low enough - it is faster to decode and show limited number of items on the client. OTOH, to produce this short list of declarations Analysis Server would have to scan all declarations again and again for each new pattern. Is it faster than to return everything once, decode it and then apply pattern on the client? I don't know.
Maybe we should start with the current implementation and see how well it works on different codebases.
Is it faster than to return everything once, decode it and then apply pattern on the client?
It would be faster within a search (eg. as the user is typing) because we don't have to go over to the server on each keystroke (Code actually batches the keystrokes up so it wouldn't be on every one) but we couldn't cache that data for long (well we could, but we'd have to invalidate it on any edit, including when packages are updated, which might be outside of the editor).
I think on a large project the extra data over the wire and the JSON deserialisation on the UI thread would make this quite bad.
Maybe we should start with the current implementation and see how well it works on different codebases.
Yeah, I'm happy to give it a test on something like the Flutter repo to get some timings rather than guessing blind though.
We could also do something in-between - like have the server filter based on pattern, but encourage editors to filter in-memory when the query is getting more-specific (and just re-send on backspaces).
When https://dart-review.googlesource.com/c/sdk/+/42586 is landed, you should be able to use search.getDeclarations
.
It seems to work pretty fast to collect declarations. Converting to JSON and sending response is more expensive.
Here are numbers for a project where Flutter repo is open.
files: 1097
declarations: 21970
Time after collect declarations: 11 ms.
Time after transform declarations : 12 ms.
Time after send response: 69 ms.
Cool; I'll give it a test later this/early next week and see what the times are like with the deserialisation on the editor side.
@scheglov
Done some testing (times are since start, so 64ms from sending request to finished mapping results):
Sending JSON 03
Got response 39
Parsed response 50
Calling handler 50
Mapping results 50
Done mapping 67
It actually feels slower than it is, because it seems like Code stops for you to stop typing before it sends a request (which makes sense normally, but since we're getting them all, we could've been fetching them while the user is typing, we don't need the query!).
Minor tweak - is it possible to add the length for this? I can't see that Code is actually using it yet, but it's part of the API so it's possible they might highlight the item in a preview as you move through them (we used to use the .location
property from SearchResult
).
If you can bump the server version so I can detect this is availble, I might ship this behind a preview flag for people to try out and see what they think and then swap over to it if the feedback is good.
Currently we return all declarations. So, you need to request declarations only once, when the user first activated Find declaration
or whatever it is called in VS Code. Then you can filter it locally. 64 ms
is not bad, it is probably OK for an explicit user action (I would like all of them be under 100 ms
).
Sorry, I don't understand. What length?
@scheglov
That's what I'd like to do, but unfortunately Code doesn't tell us the first time it's activated (our provider is created at startup). I might put a timestamp in there and cache for a short period (30-60 seconds?) to try and emulator that/
Code lets us tell is the length of the decleration, presumably so as a user moves up and down through the list with their cursor keys it could highlight the item (eg. if it's a class, and we knew the end of the class (where the final }
is) it could highlight the whole class definition). For some reason this isn't happening for me, but I'm sure I've seen it before. If you don't already have this info available don't worry about it, but if you do, it'd be nice to pass through.
Oh, found a possible issue - don't seem to get getting any className
fields populated at all? (running from latest code as of 30 min ago).
That's sad. Is search for declarations a modal operation in VS Code? Ideally they could notify the provide that the user started search (AKA the search text field is focused now) and stopped (the search text field lost focus).
Hm... Highlighting the whole class declaration does not look terribly useful for me.
Ah... I knew, I forgot something. We don't copy className
into the output API structure :-( . Will fix.
Yeah :( I went back and checked the docs and they actually split out the search (provide
) from looking up the location (resolve
) for performance reasons - shame they didn't think about this. I've added in a 30 second cache and it seems to work ok.
It actually works really well, check out this example - as I move up and down the results, it highlights the item like this (it's less noticeable in my Dark theme but you can see it obviously here):
I've implemented it at https://github.com/Dart-Code/Dart-Code/commit/e7835176908197484f2f6b0e93e5a3dbfdbbc66c I just need a server version to enable it. Rather than adding another flag, I think I'll just build a beta/dev version with it in and try and see if I can get some people to test it (@xster @pulyaevskiy) through the next few weeks 😁
FYI, adding pattern and maxResults
is not off the table, after all this API is mostly for you.
We know "source range" for each elements - from the first token of the corresponding node, to the last one. The node also includes all annotations and the documentation comment. We could add codeOffset
and codeLength
to the API.
I will fix the issue with className
and increment the server version.
I will get another version increment during/after adding code ranges.
I guess a question I have is how much faster it'll be that way. If it still takes 70% of the time to get the filtered list then it's probably not worth it; though if it's 5% of the time maybe it is!
For the document version of this search, we have a range but we have to take the start from the element because of how you have whitespace included in the outline range (see the comment at https://github.com/Dart-Code/Dart-Code/blob/master/src/providers/dart_document_symbol_provider.ts#L67-L77) so if you could return an equivilent of that it would be good.
Thanks!
Just occurred to me when looking at the document one, I think in the current version we might show parameters in methods (eg. myMethod(int something)
) to make it more obvious to the user - I wonder whether we should include them here? (Sorry, I should've picked up this stuff earlier!)
Well, as you can see in timing results, search is cheap, but sending JSON is expensive. We spend 12 ms
on search and then 58 ms
to send JSON response. So, with small enough maxResults
cut-off (e.g. 100
) Analysis Server will answer fast, because we will stop earlier when there are many matches, and spend at most 12 ms
(plus a little to send the JSON result) when there are less matches than the limit.
I'm not 100% sure that I understand the problem. take the start from the element because of how you have whitespace included in the outline range
puzzles me. Why do we talk about outlines now? What means "the start from the element"?
Parameters are problematic. We know their names, and sometimes names of their types, if it's a named type (e.g. not a generic function type). So, we could provide a single parameters
field, which is not exactly what is in code and more like a hint. Should be probably enough for the user to quickly visually inspect and select. WDYT?
What additional overhead will the pattern matching add? In my current implementation (client side search) I think the JS regex against all those items might be the slowest part (I'll do a little more testing tomorrow) - I think it was actually faster for me to give all the results to Code (which results in the wrong results, since it doesn't filter them) than to filter with the regex. The regex I do is basically a .*
between every character to do a fuzzy match, eg.:
const pattern = new RegExp(escapeRegExp(query).split("").join(".*"), "gi");
Sorry, I was jumping around. In Code we have this same symbol-search functionality that runs just for the current file also. Currently that one works using the Outline functionality for the document and it provides a range
. However that range includes the whitespace before a class (for ex) so the blank lines immediately before the class get highlighted too (which looks strange). However, the element
property starts at the correct location, so we use the start from the .element
but then the end from the outline range. My point was, if you add the range to the declaration, can ew make sure it doesn't include the leading whitespace (dart docs are fine), so that the highlight is accurate?
I'm not sure exactly what you mean - can you give an example? Do you just mean the generic type args would be missing? The single-file version of this (which uses the Outline) uses a parameters
field - if it has the same issue nobody has complained about it so it must be good enough. If not, if we don't think it looks good we could do what we do in completion, which is to show ()
or (…)
to the user?
@scheglov Something of a side question re (2): why does the outline range include whitespace before an element? Docs I understand, but why whitespace?
.*Elem.*
, .*El.*
and .*E.*
:
After search declarations: 11 ms.
Files: 19
Declarations: 91
After converting: 11 ms.
After sending: 12 ms.
After search declarations: 13 ms. Files: 30 Declarations: 110 After converting: 13 ms. After sending: 13 ms.
After search declarations: 13 ms. Files: 274 Declarations: 934 After converting: 14 ms. After sending: 17 ms.
2. Yes, for outlines we include whitespaces to ensure that the file code does not have gaps in coverage. For this new declarations API we don't need this feature, the code range will start at documentation. And you still have the offset of the name.
3. Outline uses `executableNode.parameters.toSource()`, i.e. basically dumps the AST. We don't have a luxury of having AST for every file, we will have to restore some approximation of parameters from `unlinked summary` of the file. Which is doable, but it is just a question how accurate we want it to be, and how fast we can do this.
4. Do you want also return types and type parameters?
OK, now we have pattern and maxResults
, code range, and parameters
.
It works not a bit slower for search, and quite slower for encoding many results, without limit, because of longer parameter strings.
After search declarations: 14 ms.
Files: 1103
Declarations: 22213
After converting: 16 ms.
After sending: 105 ms.
When we set low enough maxResults
, or the pattern is selective, it works fast enough - full scan about 25 ms
, and .*E.*
with 500 results
in 5 ms
.
If there are problems with the implementation, or you want to remove a feature or two, it is probably better to open new issues :-)
@scheglov Awesome, thank you! Hoping to get back to it and finish it off this week 👍
@scheglov Pattern is currently case sensitive. I could put character classes into the regex, but I don't know if it'll perform worse than if the caseSensitive
flag is set?
@scheglov For now I've just put the characters into the pattern, so I do this:
const pattern = ".*" + query.replace(this.badChars, "").split("").map((c) => `[${c.toUpperCase()}${c.toLowerCase()}]`).join(".*") + ".*";
Anyway, it works so fast even on a huge codebase I had to check the logs several times to ensure it was working! I've decided not to use maxResults
because Code doesn't give me a good way to indicate that the results are truncated (I can't add a fake item because it gets highlighted if it contains the same characters the user typed) and at this speed it seems like it's pointless.
So, I'm not caching client-side at all, I'm just sending everything to the server (though Code does wait a pause in typing before invoking us). I think people will be very happy with this change! 👍
Yes, I considered adding caseSensitive
, but decided against it - as you found out it is possible to simulate it using just pattern.
Ah, too bad you cannot write something like <Too many results>
. Write it in a different language :-)
Write it in a different language :-)
LOL! Unfortunately it's the Code API that's limited here - even if I ported Dart Code to Dart I'd still have the same issue.
Tbh, it feels much better having all the results than having them clipped (the list scrolls nicely). It's probably unlikely a user will search for something vague enough to have a large number anyway.
Thanks!
@scheglov When running my tests against the new SDK, I noticed a difference which I'm not sure if is expected (you mentioned the data would be less accurate) or a bug.
I get back declerations like this:
methodTakingString(String a)
methodTakingFunction(myFunc)
The type of myFunc
isn't present in parameters
but in the outline I think it came back as (int Function(String) myFunc)
.
Not a big deal if you don't have that info, but I thought it worth asking before I change my test to assume it should be this way.
That sounds like an oversight to me.
@bwilkerson Cool, I kinda hoped that was the case. LMK if you want me to raise a new issue.
https://dart-review.googlesource.com/c/sdk/+/46003
This adds support for function typed parameters, type arguments and type parameter references.
@scheglov Awesome, thanks! I'll test it out later in the week if it's landed :)
@scheglov Looks great, thanks!
While investigating https://github.com/Dart-Code/Dart-Code/issues/283 I noticed that
findTopLevelDeclarations
is returning unexpected results (such as things inside the SDK files), for example:As a result, searches seem to be much slower (eg. in the flutter repo, I get 75MB of JSON back for
findTopLevelDeclarations
which means a lot of deserialising, parsing, etc.).I tested a really small sample CLI app and in v1.19 of the SDK it only returned 4 results, but in 1.23 it's full of thousands of SDK results.
My PC has decided to install some large windows update when I rebooted so I can't grab any more info atm, but give me a shout if this isn't enough to go on or you can't repro and I can try and get more info over the weekend.