bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
538 stars 542 forks source link

Gazelle seems to not like PEP 695 – Type Parameter Syntax #2396

Open dougthor42 opened 1 week ago

dougthor42 commented 1 week ago

TL;DR: I think we need to bump go-tree-sitter for python 3.12 support.

🐞 bug report

Affected Rule

Gazelle

Is this a regression?

I could argue it either way, lol. I'm going to go with "probably" because it's related to #1895.

Description

PEP 695 added a Type Parameter syntax that looks like:

# Before
from typing import TypeVar

_T = TypeVar("_T")

def func(a: _T, b: _T) -> _T:
    ...

# After
def func[T](a: T, b: T) -> T:
    ...

If a python file uses this new def func[T](...) syntax, and the file has if __name__ == "__main__", then Gazelle will incorrectly generate a py_library target rather than py_binary.

🔬 Minimal Reproduction

  1. Make a python file:
# foo.py
def func[T](a: T, b: T) -> T:
    ...

if __name__ == "__main__":
    print("hi")
  1. Run gazelle bazel run //:gazelle

Expected Result

py_binary(
    name = "foo",
    srcs = ["foo.py"],
)

Actual Result

py_library(
    name = "foo",
    srcs = ["foo.py"],
)

🔥 Exception or Error

None

🌍 Your Environment

Operating System:

gLinux

Output of bazel version:

$ bazel version
Bazelisk version: v1.20.0
Build label: 7.3.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Oct 1 17:46:05 2024 (1727804765)
Build timestamp: 1727804765
Build timestamp as int: 1727804765

Rules_python version:

$ bazel mod graph                                                                                                       
<root> (pyle@0.0.0)                    
├───aspect_bazel_lib@2.7.1            
│   └─── ...
├───bazel_skylib@1.7.1                 
│   └─── ...
├───buildifier_prebuilt@6.4.0             
│   └─── ...     
├───gazelle@0.35.0                 
│   └─── ...
├───rules_helm@0.7.0 
│   └─── ...
├───rules_python@0.37.2 
│   ├───bazel_skylib@1.7.1 (*) 
│   ├───platforms@0.0.9 (*) 
│   ├───protobuf@24.4 (*) 
│   ├───rules_proto@6.0.0-rc1 (*) 
│   ├───stardoc@0.6.2 (*) 
│   ├───bazel_features@1.11.0 
│   │   └───bazel_skylib@1.7.1 (*) 
│   └───rules_cc@0.0.9 
│       └───platforms@0.0.9 (*) 
└───rules_python_gazelle_plugin@0.37.2 
    ├───bazel_skylib@1.7.1 (*) 
    ├───gazelle@0.35.0 (*) 
    ├───rules_go@0.45.1 (*) 
    └───rules_python@0.37.2 (*)

Anything else relevant?

I haven't been able to find a human-readable changelog of tree-sitter or go-tree-sitter that explicitly says when it added support for python 3.12...

dougthor42 commented 1 week ago

ah, dang:

https://github.com/bazelbuild/rules_python/blob/91e5751eacd5443728dba16d38d1136375c2b51a/gazelle/python/file_parser.go#L85-L88

I just tried this locally and got the same issue. @hunshcn do you have any additional insight for this?

dougthor42 commented 1 week ago

Opened https://github.com/smacker/go-tree-sitter/issues/175.

maffoo commented 1 week ago

I think type parameter support was added to the tree-sitter grammer in https://github.com/tree-sitter/tree-sitter-python/commit/71977ea08cd38f28f00edeb3f46b2e881f9c58e0, and should be included in grammer versions 0.20.4 and later. It looks like this was first included in go-tree-sitter in https://github.com/smacker/go-tree-sitter/commit/8516e806aa967773ff7d77f79985741a58462c8e.

dougthor42 commented 1 week ago

This is looking more and more like an issue with tree-sitter, but I may have to update Gazelle to fallback to the old python-based parser because of limitations elsewhere.


In our code base, I narrowed things down to a single line diff:

# get_deps.py
# using this line is fine - tree-sitter correctly parses the module
def search_one_more_level(graph: dict[T, set[T]], seen: set[T], routes: list[list[T]], target: T) -> list[T] | None:

# using this line cause tree-sitter to fail parsing - see below
def search_one_more_level[T](graph: dict[T, set[T]], seen: set[T], routes: list[list[T]], target: T) -> list[T] | None:

The only difference between the two lines is the addition of [T] after the function name.

I added some debugging prints to gazelle/python/file_parser.go:

--- a/gazelle/python/file_parser.go
+++ b/gazelle/python/file_parser.go
@@ -17,6 +17,7 @@ package python
 import (
        "context"
        "fmt"
+       "log"
        "os"
        "path/filepath"
        "strings"
@@ -184,6 +185,10 @@ func (p *FileParser) Parse(ctx context.Context) (*ParserOutput, error) {
        if err != nil {
                return nil, err
        }
+       if strings.Contains(p.relFilepath, "get_deps.py") {
+               log.Printf("after parsing file, got rootNode=%q", rootNode)
+       }

        p.output.HasMain = p.parseMain(ctx, rootNode)

And ran gazelle.

# the "good" code logs:
gazelle: after parsing file, got rootNode="(module (comment) (comment) (comment) (expression_statement (string)) (import_statement name: (dotted_name (identifier))) ...

# the "bad" code logs:
gazelle: after parsing file, got rootNode="(ERROR (comment) (comment) (comment) (expression_statement (string)) (import_statement name: (dotted_name (identifier))) ...

So something is up with tree-sitter where it errors out but doesn't return an err. The net result of this is that the subsequent call to FileParser.parseMain fails and the file is incorrectly labeled as a library (instead of a binary).

dougthor42 commented 1 week ago

I think type parameter support was added to the tree-sitter grammer in https://github.com/tree-sitter/tree-sitter-python/commit/71977ea08cd38f28f00edeb3f46b2e881f9c58e0, and should be included in grammer versions 0.20.4 and later. It looks like this was first included in go-tree-sitter in https://github.com/smacker/go-tree-sitter/commit/8516e806aa967773ff7d77f79985741a58462c8e.

@maffoo yes I agree; the 3.12 grammar was added to tree-sitter a while ago. rules_python is currently pinning to the version just before go-tree-sitter updated the python grammar. rules_python pins 0628b34 and the python grammar was updated in the next commit 8516e80 just like you said.

image


I tried forking go-tree-sitter and hacking things to use the old scanner.cc, which doesn't reference ../array.h, but that naive approach sadly did not work.

maffoo commented 1 week ago

As I understand it we are using the standard rules_go rules to import the go-tree-sitter repository: https://github.com/bazelbuild/rules_python/blob/155efce562f14d46530fb5bec698a11e2ee889f5/gazelle/deps.bzl#L188-L193

And then we use two targets which are created automatically by this import: https://github.com/bazelbuild/rules_python/blob/155efce562f14d46530fb5bec698a11e2ee889f5/gazelle/python/BUILD.bazel#L45-L46

Is it possible to define a more custom way to build that instead of relying on targets generated automatically by rules_go, since clearly rules_go is not able to build this package by default? I guess rules_go does more than just invoking go build under the hood, but maybe there's a way to customize it a bit, e.g. by telling it that the @com_github_smacker_go_tree_sitter//python target should include some files from the parent directory.

maffoo commented 1 week ago

Actually, looks like the go_repository rules comes from gazelle itself, not from rules_go: https://github.com/bazel-contrib/bazel-gazelle/blob/master/repository.md#go_repository. There are a lot of options there, so maybe there's a way to customize it to work with the newer go-tree-sitter.

maffoo commented 1 week ago

I checked out the go-tree-sitter repo and ran gazelle to generate build files as described in the rules_go docs. To get //python to build, I added a filegroup in the root BUILD.bazel:

filegroup(
    name = "common",
    srcs = ["alloc.c", "alloc.h", "api.h", "array.h"],
    visibility = [":__subpackages__"],
)

and then added a reference to this in the go_library in python/BUILD.bazel:

go_library(
    name = "python",
    srcs = [
        "//:common",
        "binding.go",
        ...

This was enough to get //python to build. Given that, it seems like one way to get things working with the new go-tree-sitter would be to add these changes as patches when we bring in go-tree-sitter as a go_repository.

dougthor42 commented 1 week ago

Thanks Matthew! My next attempt was going to be to just move/copy those C flies into the python dir and tweak headers accordingly (also via a patch).