facebook / starlark-rust

A Rust implementation of the Starlark language
Apache License 2.0
672 stars 53 forks source link

Make the AST public #77

Closed ahupp closed 10 months ago

ahupp commented 1 year ago

Most of the internals of starlark-rust are private or pub(crate). For some work I'm doing I need to access and modify the AST, so it would be nice to have those accessible. The README says:

We do not aim for API stability between releases, preferring to iterate quickly and refine the API as much as possible. But we do follow SemVer

So in principle this should be fine. A good middle ground might be to move the AST modules to their own crate, then the existing starlark crate would not need to make breaking changes as often. Moving the AST to its own crate would also open the path to moving the LSP server to its own crate as well.

Thoughts?

stepancheg commented 1 year ago

I was thinking of extracting AST into a separate crate to make compilation faster.

Moving LSP into a separate crate also makes sense because LSP pulls dependencies not needed by many users.

However, can you please describe your use case for AST?

ahupp commented 1 year ago

can you please describe your use case for AST?

Two unrelated bits of work:

stepancheg commented 1 year ago

re-write (some) function calls to implicitly memoize their side effects

Shouldn't it be better done by the function implementation themselves? Perhaps, runtime wrappers of such functions?

Note AST is hard to rewrite correctly. Consider this example:

# read_file, native function

def make_container_image():
  alias = read_file
  read_file = 10
  alias("txt")

trace the control flow that produces each value

Note in tools like buck2 (but not only), we have higher level abstraction like "rules". I.e. intermediate values are not and memoized, but they are separate objects in the system. For example:

cxx_library(name = "hello")

cxx_binary(name = "world", deps = [":hello"])

Perhaps that's what you want. But I don't know your use case.

Let me think a bit, but I think we want to expose AST. I hesitate because with more public API, we break more code between releases.

ahupp commented 1 year ago

Note in tools like buck2 (but not only), we have higher level abstraction like "rules".

Yup, I was at FB for a while and this is partially informed by experience with buck (haven't tried by buck2 yet), as well as my dislike of third-party2. I'm targeting something more like toast, but with a different interface.

with more public API, we break more code between releases.

Just make a new crate starlark-ast-do-not-use :)

ndmitchell commented 1 year ago

Not a fan of extracting the LSP into a separate create - it complicates things, makes it much harder to do stuff, enforces public boundaries between areas. Is there any real benefit?

Given we already break a lot of stuff, my inclination is to just expose the AST and accept that it breaks between each release.

stepancheg commented 1 year ago

@ahupp to avoid misunderstanding, I did not suggest using buck2 (although it also could be an option), but instead implement buck2-like setup.

laurentlb commented 1 year ago

fyi, https://github.com/bazelbuild/buildtools has a Go library that provides access to a Starlark AST (e.g. you can manipulate it and pretty print it). The interface is pretty stable and it is used by many Starlark tools. Maybe that will work for you use-case?

stepancheg commented 11 months ago

I'm currently moving AST into a separate crate starlark_syntax (for faster incremental compilation; would take a couple of days to get into the repository, and more time to be released on crates). Feel free to use it if you want.

However, if you goal is codemod, I suggest using a library like libCST, since Starlark syntax is now officially a subset of Python.

ahupp commented 10 months ago

That's an interesting point, I have some experience with writing syntax transformers with python's grammar in pegen which has similar benefits to libCST. Will take a look.