[bndtools] Add Content Assist for imports

kriegfrj commented 4 years ago

bndtools/bndtools#1850 added quick-fix suggestions for missing bundles for imports.

Another related feature that Eclipse has is suggestions as you type. Eg, you start typing import org.osgi... and it gives you a drop-down list of known packages to import.

The built-in Eclipse suggester will only suggest packages that are already on the build path. Bndtools could potentially add suggestions of exported packages from all bundles known in the repositories -even those that are not yet in the build path. It could then (like the Quick Fix processor) automatically add the relevant bundle to -buildpath.

pkriens commented 4 years ago

This would be heaven ... when do we get a PR? :-)

pkriens commented 4 years ago

I just looked at the code. I saw here https://github.com/bndtools/bndtools/pull/1850/files#diff-d864d16e149e1242dd5554a5620a8229R354-R366 that you get the RepositoryPlugins. Why not use Workspace.getRepositories() directly? This will give you all the Repository objects that end uses for resolving. You current might miss repositories because there is an assumption that RepositoryPlugin implements Repository which is false. Or am I missing something?

You can also search for all packages you're looking for in one go using the multi requirement facility on the current OSGi. Or do the search in parallel since it might involve downloading.

If you only have the class name it, of course, becomes harder. We could first ask JDT then. If it is anywhere on the classpath then it should be able to have reference to the JAR. Otherwise, we need to index all repositories ...

Since we're already indexing the classes for each repository we could extend the ResourceBuilder to include a Bloom filter or so for class names. We could even add this to the XML as a proprietary namespace so that remote indexes also contain this info. The Bloom filter would allow us to detect the presence so we can then do a proper check.

I would strongly prefer that any logic goes into bndlib. I.e. the package/class search function should go to Workspace and not be done in bndtools. The push for Intellij is increasing and any function inside Eclipse cannot be reused. It would even be an interesting function in bed to search repositories.

kriegfrj commented 4 years ago

I just looked at the code. I saw here https://github.com/bndtools/bndtools/pull/1850/files#diff-d864d16e149e1242dd5554a5620a8229R354-R366 that you get the RepositoryPlugins. Why not use Workspace.getRepositories() directly? This will give you all the Repository objects that end uses for resolving. You current might miss repositories because there is an assumption that RepositoryPlugin implements Repository which is false. Or am I missing something?

You're probably not missing anything. I was pretty green (I've still got a way to go before I fully brown off!), so I just copied the only bit of code I knew that could search repositories for exported packages, and that was the Repositories view in Bndtools.

You can also search for all packages you're looking for in one go using the multi requirement facility on the current OSGi. Or do the search in parallel since it might involve downloading.

If you only have the class name it, of course, becomes harder. We could first ask JDT then. If it is anywhere on the classpath then it should be able to have reference to the JAR. Otherwise, we need to index all repositories ...

Since we're already indexing the classes for each repository we could extend the ResourceBuilder to include a Bloom filter or so for class names. We could even add this to the XML as a proprietary namespace so that remote indexes also contain this info. The Bloom filter would allow us to detect the presence so we can then do a proper check.

I would strongly prefer that any logic goes into bndlib. I.e. the package/class search function should go to Workspace and not be done in bndtools. The push for Intellij is increasing and any function inside Eclipse cannot be reused. It would even be an interesting function in bed to search repositories.

These all sound like great suggestions (and thanks for teaching me about Bloom filters!) When I vaguely looked at it some time ago I had planned to use Eclipse's indexer. But the argument for doing it in bndlib is compelling.

pkriens commented 4 years ago

Don't put yourself down, I am very grateful what you're doing (and inspiring us to use assertj!)

I like the idea a lot to have the Bloom filter for classes part of the ResourceBuilder. That would make it end up in all the XML. A fallback could then be JDT, but maybe we should just bet first on this new namespace. bnd is pretty pervasive in the XML generation I think.

kriegfrj commented 4 years ago

Don't put yourself down, I am very grateful what you're doing (and inspiring us to use assertj!)

Heh, I didn't consider it so much a put-down as an accurate description of the way things were! I certainly would have done things differently with the benefit of hindsight, but that's only to be expected. :smile:

I like the idea a lot to have the Bloom filter for classes part of the ResourceBuilder. That would make it end up in all the XML. A fallback could then be JDT, but maybe we should just bet first on this new namespace. bnd is pretty pervasive in the XML generation I think.

This sounds like the way to go. I had a quick look at ResourceBuilder but there are too many gaps in my knowledge of that part of the code that I would have to fill to get it done (I don't even know which XML you're referring to!), which I don't think I have time for. So although I think it would be fun and interesting I'm going to have to pass on the opportunity. Glad to have at least planted the seed.

pkriens commented 4 years ago

The ResourceBuilder creates a resource. It is used to turn a bundle into a Resource by parsing a manifest. (addManifest/addFile). In the resource builder we can, if we have the file, also analyze the classes and create the bloom filter. We can then add this as a capability in the class.index namespace. The bloom filter would be a property on that capability.

Once it is a capability, it is trivial to get all those capabilities from all the repositories. We also do this trick in the templates and errors in the BridgeRepository class.

Once it is in the ResourceBuilder, the XML we generate for the XML indexes would automatically include it.

If you want I can make a simple example how to do this.

kriegfrj commented 4 years ago

If it's not too much trouble for you, I think a simple example using ResourceBuilder might be enough to get me started. Though I'm still concerned about my time...

pkriens commented 4 years ago

We could even add the bloomfilter to the manifest? Not sure how much data that would take. The advantage is that we do have all the classes already in memory when building the bundle. We could even directly provide the capability in the manifest?

bjhargrave commented 4 years ago

We could even add the bloomfilter to the manifest?

It would be better for Bnd to add this capability to the manifest (Provide-Capability header). Then it would be also be visible at runtime, and Bnd's indexing support will naturally process the capability into the index.

pkriens commented 4 years ago

That is also what I meant. Then it all falls into place. Would be quite cool. Though we need to make it optional since it could be significant data and we need to encode it as a string.

bjhargrave commented 4 years ago

We also need to decide some other things, probably through experimentation, such as bucket size and hash algorithms. I would assume the capability value is just the hash data which would be a string of length determined by the bucket size chosen.

pkriens commented 4 years ago

yup, where are the trainees when you need them :-( I expect we can be pretty coarse since any Resource we can exclude to search is nice but not crucial. Maybe we should even do some Soundex algorithm on the keys so that we capture misspellings

kriegfrj commented 4 years ago

The main issue I can see with the way that this discussion is heading is that we are effectively be creating a public specification for this index, which in turn puts constraints on how we can change the index. We'd need versioning and/or parameterization to allow for the format of the index to change & evolve.

I think you would probably want to make the bucket size variable, so that Bnd could calculate an optimal bucket size when the index is created (and the number of elements is known). The index consumer might be able to back-calculate it from the index itself.

Changing hashing algorithms however would not be as straightforward, and would need some kind of change management, so that it could evolve in the future to support (eg) SoundEx or trigrams for supporting misspellings.

pkriens commented 4 years ago

We will start using a bnd namespace for this so we can experiment. If it works out well, we can submit it to the OSGi Alliance. There are lots of things that started that way. The danger is always to standardize too early. First, get it to work, then standardize. Skills you need to standardize work against you in the initial dark period of exploration. (And vice versa!)

In this case, we can define attributes for the algorithm and bucket size. Although I would be very careful to make it too flexible. I.e. not sure if we need to standardize an algorithm. Imho flexibility is highly overrated in standards. The purpose is interoperability. Optionality and flexibility tend to make standards very hard to implement and use.

The nice thing is that the Req/Cap model allows us to insert our bnd proprietary format and leverage all the tooling. That was exactly the purpose of this model when Richard Hall and I worked on this sooooo long ago in RFC 112.

kriegfrj commented 4 years ago

Absolutely, I agree that standardizing too early (especially as formal as an OSGi Alliance standard) would be a major mistake. There are some lessons that can only be learned in that "initial dark period of exploration".

By "public specification", I didn't mean something as formal as an official OSGi spec. What I meant was: there is a difference between a situation where the process that generates the index is also the process that consumes it (ie, if it was all done on-the-fly in memory by Bnd/tools), vs a situation where it could be generated by one process and consumed by another. Even if that process is Bnd, it could be a later version of Bnd. The spec is "public" in the sense that there has to be some kind of contract between the producer and the consumer. At the very least, I think you'd need a "version" attribute so that the consumer knows whether or not it can understand the version of the index that has been generated.

I also think that, while the Bloom filter would be a good start, it would not be sufficient on its own to implement a full gamut of code suggestions. For example, let's say I use a Bloom filter to store a hash of the trigrams of all fully-qualified class names in my bundle. I have "org.mypkg.Catalog" in my bundle, and my editor has the unknown class "Cat". The Bloom filter will tell me that my bundle contains a class that matches; however it won't tell me the full name of that class so I won't be able to import it. The most the Bloom filter would allow us to do would be to suggest adding that bundle to the build path. Admittedly, once added, Eclipse would index it and then Eclipse might suggest "org.mypkg.Catalog" as a correction. That would work, but it would be a nicer user experience if it could be done in one step and all in bndlib land. To support this scenario you'd need a lossless indexing algorithm. But this also means more space. Perhaps it could be calculated on-the-fly, or else stored somewhere in OSGI-INF.

Actually, although this might break the strict application of the "bndlib" rule, I know (because I briefly looked into this idea some time ago) that Eclipse gives you programmatic access to its indexing engine, and that you can programmatically index a particular jar and have the results stored in a file. Then you also have programmatic access to the search engine and you can tell it which indexes to search. So a possible hybrid Bndtools implementation of code suggestions might look like this:

Preparation:

Generate a Bloom filter and store it as a Capability in the manifest (as discussed).
When bundles are loaded by Bndtools, index them using Eclipse's indexer and store the generated index file in a known place.

Use:

The Bloom filter capability can quickly the answer the question "which bundles (if any) could contain the class we're looking for".
We can then search the indexes corresponding to those bundles. Possibly store them in an LRU cache.

This should be a lot more efficient than trying to maintain all of the indexes in memory at one time.

An alternative would be to do the same thing entirely in bndlib, with a bndlib's own equivalent implementation of Eclipse's index.

Anyway, just throwing ideas out there.

pkriens commented 4 years ago

I am starting to experiment so we can discuss some actual examples against some actual data. I intend to collect all class names that are on my system.

kriegfrj commented 4 years ago

I just remembered why this was originally all done in Bndtools land, rather than the current (superior) proposal to offload the search capability to Bnd - back when I wrote it, Bndtools and Bnd were two different projects. I didn't want to have to touch two rather than one only. So another :+1: for the decision to merge the projects. :smile:

kriegfrj commented 4 years ago

Just been doing a bit of research and wanted to update this issue for future reference of the uninitiated who may eventually tackles this issue (whether that be my future self or someone else), to aid their navigation of the code completion environment.

Eclipse offers two different types of code completion:

Content Assist (ctrl-space) - this is for code suggestions for incomplete code as you type. However, it seems you cannot add custom Content Assist simply using extensions - if you want to add custom content assist for Java code, you need to implement the org.eclipse.jdt.ui.javaCompletionProposalComputer extension point (implemented by classes extending org.eclipse.jdt.ui.text.java.IJavaCompletionProposalComputer). The standard Java Content Assist implementation will then invoke your extension to calculate proposals to add to the Java Content Assist drop-down.
Quick Fix (ctrl-1) - this is for suggested fixes to problems that have been identified by the builder. Implements org.eclipse.jdt.ui.text.java.IQuickFixProcessor and registered under the Eclipse extension point org.eclipse.jdt.ui.quickFixProcessors.

The original implementation for this was only for Quick Fixes, as they seemed less convoluted than Content Assists. But at the end of the day I think that they're both quite similar once you know which part of the Content Assist API you actually need to implement.

kriegfrj commented 4 years ago

@bjhargrave , I think you jumped the gun closing this one... @pkriens made improvements to the quick fix back end which will also make content assist easier to implement, but as yet the content assist part still hasn't been implemented.

bndtools / bnd

[bndtools] Add Content Assist for imports #3808