Bazel Language Server : Depend on the Bazel Starlark source code

josiahsrc commented 3 years ago

Description of the problem / feature request:

Hello!

A few of my friends and I are working on developing a language server for bazel, built in java, for our university capstone project. We would like to use the java implementation of the starlark parser provided in this repo at the following location:

src/main/java/net/starlark/java/**/*.java

Our project is built using bazel, however we are unable to import these source files because many of them are marked as private. My question is, how would we best go about depending on these source files? Is this possible?

Thanks in advance for any help :)

Feature requests: what underlying problem are you trying to solve with this feature?

Being able to depend on and use the bazel implementation of starlark so we can develop a language server that is consistent with the Bazel build system.

What operating system are you running Bazel on?

Linux

What's the output of `bazel info release`?

release 3.7.1

What's the output of `git remote get-url origin ; git rev-parse master ; git rev-parse HEAD` ?

git@github.com:bazelbuild/bazel.git 3c685b6d2d34cd24450c11a902f1abfc3716d093 3c685b6d2d34cd24450c11a902f1abfc3716d093

josiahsrc commented 3 years ago

Update:

We've forked the repo and added a workaround to create a jar file from the starlark source files. These changes are available here.

This, however, doesn't fully fix the problem. We would like to depend on the up-to-date source files provided in this repo.

chancila commented 3 years ago

you can look at how copybara does it:

https://github.com/google/copybara/blob/master/WORKSPACE#L70 https://github.com/google/copybara/blob/master/third_party/BUILD#L144

they basically just pull in the whole bazel repo as an external workspace and use the starlark libs.

josiahsrc commented 3 years ago

@chancila I just tested that out and it worked flawlessly. Thank you!

I'll go ahead and close this issue.

laurentlb commented 3 years ago

cc @alandonovan

alandonovan commented 3 years ago

Hi Josiah, thanks for your interest in Starlark. If I might give you a word of advice: the most valuable thing I never learned during my formal CS education is the impact of dependencies on the success of software projects. When you add a dependency on another package, you are placing a bet that its stability, speed, robustness, and security will always be good enough for your needs. Dependencies are the foundation of your house. When a package has explicitly marked its visibility as private, it is a clear signal that it will likely not provide the stability you need, and that if you build your house on those foundations, it will repeatedly fall down, requiring effort to repair.

The Java implementation of the Starlark front-end may be good enough for a college project, but if you plan for this work to outlive your semester I strongly advise that you use an implementation with a stable public API, such as the Go implementation at go.starlark.net. (As a bonus, it's also an opportunity to work in Go, which is in many ways a better language for writing servers and system tools.)

josiahsrc commented 3 years ago

@alandonovan Thank you for your detailed reply. I haven't thought of dependencies as the foundation of a project before (I've thought of them more as utilities), but that makes a lot of sense and clears up a lot of headache. I've shared this insight with my team. Our original workarounds to depend on the java Starlark libraries felt hacky to begin with. As evidence of this, only after a few days of implementing our patch, the java source code has already been updated in this repo.

We're hoping to eventually have a stable implementation of a Bazel language server (using language server protocol) that we can share with the open-source community. We're attempting this in response to this issue as well as the needs of our sponsors. We'll be switching over our language server to use the go.starlark.net libraries you've suggested here (as well as attempt to use protocol buffers to make use of our existing java lang server code), in an effort to rely on more stable dependencies.

Thank you again for your help. If possible, could my team and I reach out to you in the future with more Bazel/Starlark specific questions?

cc @BYU-Bazel

laurentlb commented 3 years ago

Your language server will need to know the list of classes/methods that are specific to Bazel. The current way to do it is to run the ApiExporter binary (see https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/docgen/ApiExporter.java and https://github.com/bazelbuild/bazel/blob/master/src/main/protobuf/builtin.proto) and get the data in a proto.

@alandonovan Can you comment on how stable this interface is?

alandonovan commented 3 years ago

The ApiExporter interface is similarly locked down, and does change from time to time, though not very much. However, its output is essentially a list of identifiers that are predeclared in the Starlark environment, so if Josiah's LSP project simply checks in the list of identifiers as a source file, and updates it from time to time (either by hand or by running ApiExporter), there is relatively loose coupling of their tool with the Bazel source.

I assume this project will not attempt to actually execute BUILD or .bzl files. (That way lies madness.)

laurentlb commented 3 years ago

The output contains type information for the builtins, as well as the fields available in each type. That's very convenient for users exploring the ctx object, for example. A dependency on the proto sounds good to me.

As an alternative to Starlark-go, I can suggest depending on Buildifier (https://github.com/bazelbuild/buildtools). The package is capable of parsing Starlark files too, and is often used as a foundation for static tools. Since the language server shouldn't need to evaluate code, this might be a good solution. It's able to pretty print the code, which can be useful: LSP has a format action, as well as a way to suggest code changes.

josiahsrc commented 3 years ago

For some clarification, our initial goals for the language server are:

Provide autocomplete for Bazel
Provide a way to navigate from a source file to its associated BUILD file
Provide auto-formatting (the google buildtools look promising, as suggested by @laurentlb)
Provide semantic highlighting (for path correctness and such)

If all goes well, we'll move on to implementing harder tasks.

Our initial impressions of how to go about implementing these features is to build a simple tree of a user's WORKSPACE and BUILD files. When we need to evaluate any specific file in the tree, we will parse it using either of the aforementioned buildifier or starlark-go starlark parsers. This is a different route than the ApiExporter, but allows us to not have to depend on a private package. I believe this is similar to what @alandonovan was referring to as well.

Thoughts on this?

alandonovan commented 3 years ago

My main concern is that you don't depend on Java class interfaces, as we don't guarantee API stability. (We may, eventually, for the net/starlark/java/... subtree, but we're not there yet.) The proto interface is lower risk, since it was at least designed to decouple the producer and consumer.

When we need to evaluate any specific file in the tree,

I assume you mean "parse", not "evaluate", here. Your application should not need to link in the evaluator (go.starlark.net/starlark in Go, or net.starlark.java.eval in Java). If it does, it's a sign something is wrong with your design. Evaluating BULD files is essentially impossible for any tool but Bazel. Do not be fooled by its simple syntax into thinking it should be an easy task.

josiahsrc commented 3 years ago

Ok, that sounds good. I'll shoot for using the protobufs first then (and keep an eye on the net/starlark/java... subtree to see if it becomes public).

Sorry, yes I meant parse. We're going heed your warning on the evaluation and stick with simple syntax parsing.

Thank you @alandonovan @laurentlb for your help! This info has helped us get a good footing on where to go next on our project. If possible, could we reach out to you both in the future with any starlark/bazel specific questions as they come up?

alandonovan commented 3 years ago

could we reach out to you both in the future with any starlark/bazel specific questions as they come up?

Of course. Good luck!

josiahsrc commented 3 years ago

@alandonovan @laurentlb Hello!

Thank you both for the previous suggestions. They helped my team and I push through some roadblocks we were having. For an update on our project, we have imported the buildifier to provide file formatting as well as created a way to autocomplete bazel paths. We are now moving onto parsing starlark files to provide semantic highlighting from our server (e.g. underlining syntax errors in an IDE, etc).

We have a few promising ways of doing this (some previously discussed):

1. Through dynamically linked libraries

I'm leaning towards using dynamically linked libraries to communicate with the starlark-go source code just because it's one less binary for us to manage. If we did this, we would, however, have to modify the starlark-go to build the DLL we need.

2. Through protobufs

Using protobufs would require us to define .proto files to match the API that we need from the starlark-go repo, which is work that we would like to avoid. Bazel-generated protobuf files also don't integrate well with autocomplete features, from my experience, which would make developing the language server much slower.

3. Using the java implementation

This would be the easiest way, as it's already been created. I'm aware of the previous comment @alandonovan made about the API changing, but this comment taken from this README leads me to believe that the starlark API will change no matter what we choose.

Despite some differences, the Go implementation of Starlark strives to match the behavior of the Java implementation used by Bazel and maintained by the Bazel team. For that reason, proposals to change the language itself should generally be directed to the Starlark site, not to the maintainers of this project. Only once there is consensus that a language change is desirable may its Go implementation proceed.

If we did this way, we might use a facade just to ensure the starlark implementation could be easily switched out.

Thoughts on this? We're a bit lost on where to go here. We would really like to move forward with this by using a stable starlark dependency (because we want to get this server to the bazel community), but each avenue we take has drawbacks.

josiahsrc commented 3 years ago

Update:

I've figured out a decent route to go. I was able to create a wrapper around the starlark-go repo in golang. I then compiled that into a shared library, or dll, using Cgo. With this, I was able to successfully use JNA to link up our java language server with the starlark-go parser!😄 This should fix the dependency concerns that @alandonovan raised

Because some of the POJOs that the starlark-go parser is returning are complex, I've opted to encode POJOs sent to and from golang as strings using protobufs. So, effectively, the server will do this:

func ParseStarlarkCode(protoEncodedInput string) (protoEncodedOutput string) {
  parseinput := &myprotos.ParseInput{}

  // The input definition is complex, using protobufs here so that we don't have to define
  // the POJO objects on both ends.
  protobufs.Unmarshal(protoEncodedInput, parseinput) 

  ...

  return protobufs.Marshal(parseOutput)
}

laurentlb commented 3 years ago

It seems that the difficulty was to use a Go library from Java. But why do you use Java at all? I assumed you would write the language server in Go.

alandonovan commented 3 years ago

I've figured out a decent route to go. I was able to create a wrapper around the starlark-go repo in golang. I then compiled that into a shared library, or dll, using Cgo. With this, I was able to successfully use JNA to link up our java language server with the starlark-go parser!😄 This should fix the dependency concerns that @alandonovan raised

I agree with Laurent. This sounds hellishly complex and inefficient. What benefit does Java provide in this design?

josiahsrc commented 3 years ago

When we started this project, we weren't very familiar with the Bazel ecosystem, language server protocol, or the different implementations of Starlark. We thought it best to use java for the language server for a few reasons.

Bazel itself was primarily written in java. It made sense to us that the language server would also be written in java.
We weren't aware of the various implementations of Starlark. We figured that the ideal Starlark parser to use for our Bazel language server would be the Starlark parser written in Java, specifically designed for Bazel.
Golang wasn't listed as a supported SDK for language server protocol. See the officially supported SDKs here. We weren't aware that there was an Go LSP implementation, albeit with little community support, or that we even needed to use one.
"My main concern is that you don't depend on Java class interfaces, as we don't guarantee API stability. (We may, eventually, for the net/starlark/java/... subtree, but we're not there yet.)". The Java Starlark implementation, according to @alandonovan , may be released to the public. So, if there prove to be any inefficiencies with our DLLs, we could easily switch out the implementation and directly call the Java Starlark implementation.

We're not needlessly trying to use Java. We picked it because, to the best of our knowledge, it would integrate well with Bazel. Given that you both don't like the idea of a Java language server, I could speak with my team about switching to Go. Again, we want to build this to be used by the community, so we're open to switching our design to what would work best. Switching would, of course, slow us down quite a bit because we would have to translate all of our existing code and infrastructure to Go.

bazelbuild / bazel