davidhalter / jedi

Awesome autocompletion, static analysis and refactoring library for python
http://jedi.readthedocs.io
Other
5.73k stars 503 forks source link

Support Python 3.13 #2003

Closed PeterJCLaw closed 1 week ago

PeterJCLaw commented 3 weeks ago

Various fixes to get the tests passing on Python 3.13, all within the tests themselves. Also some small tidyups in one of the tests. Moves to the 3.13 grammar from parso 0.8.4 and add testing on 3.13.

Reviewing by commit may be useful.

Fixes https://github.com/davidhalter/jedi/issues/2002.

Discovered issues:

davidhalter commented 3 weeks ago

Thanks a lot for looking into this! Please ping me once you are finished. I'm also happy to help with issues. It feels like you are almost done, though.

PeterJCLaw commented 3 weeks ago

I'm also happy to help with issues. It feels like you are almost done, though.

Heh, the "obvious" things are fixed. I don't immediately have great ideas for the pathlib thing though. I also haven't looked at why the test_imports.py tests are failing. This could be the same thing or they might be something else.

An idea I have around the pickles issue, but which I don't like, is that we could adjust the binary content of the pickle itself, to rename the path. That can be made to work (and I believe it could be done in a way that's reliable), but it doesn't feel right. It feels like a better solution would be to hook the pickling logic somehow, but I don't know if there are hook points which would enable that?

PeterJCLaw commented 2 weeks ago

https://github.com/davidhalter/jedi/pull/2003/commits/86f185e33fb8230179910c1c1e1d0e9e4c38f4a2 addresses the pathlib move, turns out there is a hook in pickle which solves exactly this issue and was easy to get working :)

davidhalter commented 2 weeks ago

I don't think I would find a better solution to load those pickled objects. However, I think theres also a question of why these pickled objects need to be loaded across Python versions. iter_module_names AFAIK simply returns a list of strings, so if there's suddenly some weird pathlib object in there, I think we might also be able to fix something there. But I can also think about that a bit later once I have time (in a few hours).

PeterJCLaw commented 2 weeks ago

Interesting. I thought I saw a function (get_module_info) trying to be pickled, though maybe that was from a separate call and I misread the logging. I'll circle back to that and confirm what was happening.

I'm currently looking into why Python 3.13 is seeing flaky tests; I can intermittently reproduce it locally, so it's not just CI.

PeterJCLaw commented 2 weeks ago

On the flakes: I've got as far as it seeming to be that an inference state id for a state which the sub-process doesn't know about (or seems not to know about) is being requested to be deleted. From what I can tell the inference state in question has been used with its subprocess (and that use doesn't appear to have errored).

davidhalter commented 2 weeks ago

I thought I saw a function (get_module_info) trying to be pickled, though maybe that was from a separate call and I misread the logging. I'll circle back to that and confirm what was happening.

You are right about that. get_module_info is called from a pytest test with params from pytest. But as far as I can see even for that function there shouldn't be any special classes in there. However it might be that in some places a Path is returned now instead of a str. We can probably just use str(...) wherever that is needed and we will probably be fine?

Does that help? I feel like the Unpickler would work, but I think it's probably bad that we transfer complex Python objects anyway that we don't control, because we're working across Python versions.

I'm happy to also start debugging here if you don't feel like working on this anymore.

On the flakes: I've got as far as it seeming to be that an inference state id for a state which the sub-process doesn't know about (or seems not to know about) is being requested to be deleted. From what I can tell the inference state in question has been used with its subprocess (and that use doesn't appear to have errored).

That sounds pretty bad. This is probably some fallout from my hack of reusing the InferenceState datastructure in subprocesses. Sucks, but I'm not sure how to help. This is something I feel kind of guilty about and will of course work on it, if you don't find a solution quickly. There's not much I can recommend here, this is just annoying work and trying to understand code. Something I always try first is to use pytest -I to use the interpreter and avoid any subprocesses. Typically that solves the problem. I would then try to run 3.13 against 3.13 and see if that works. If only 3.13 against the other versions fails, I would guess that there's issues with pickling sometimes.

I haven't seen any signs about inference states in sub-processes. Is that a result of your debugging? If so, how did you debug this? Asking for a friend, because I'm pretty sure this is also going to haunt me :D

PeterJCLaw commented 2 weeks ago

On the pathlib stuff: I haven't looked in detail at which functions are called, though through debugging the other issue I feel I have a better understanding of what's going on with the subprocesses (in general) now. Having a very brief look at the functions there which could be causing the issue, get_sys_path feels like it could be one -- I believe sys.path supports pathlib.Path entries (and has for a while now), so that could easily have Paths in the returned value. That doesn't quite match up though as I had thought the issue the tests showed were when Python 3.13 was the parent and the error was in the child process. So that would suggests that the Path was in the arguments. Can't remember though - would need to check the test output to be sure.

On the subprocess stuff, I have observed that InferenceStateSubprocess.__del__ is called more than once for an instance with the same self._inference_state_id. I don't know if the id is a guarantee of the same object (especially since it's not the id of itself); can Python re-use ids within the same process if the previous instance has been garbage collected? Might that have changed in Python 3.13? (I know there's a bunch of work with a JIT happening, so GC changes seem plausible). If this is a change it feels a pretty big one and I don't see anything in the release notes.

In terms of how I've been debugging this, it's a lot of print statements I'm afraid. I've added a small hook to get logs back from the child process, but it's not pretty (and relies on collaboration from the subprocess, so it's not full logging; it also avoids using the stderr stream as I couldn't work out how to read that in a non-blocking manner). I've attached a diff to this message in case it's useful though. debugging.diff.txt

I've then been running a subset of the tests in a loop until they fail:

export JEDI_TEST_ENVIRONMENT=3.10
while pytest --capture=no test/test_integration.py > out.txt 2>&1 ; do echo ; done

edit: added mention of JEDI_TEST_ENVIRONMENT.

I'm going to add some more tracking of the inference state ids to get a sense of where they get created, which will help clarify the GC interaction I hope.

I'm not sure how much more time I'll have to spend on this this weekend and probably won't get back to the pathlib stuff soon. Hopefully I'll make some more progress on the ids today though.

PeterJCLaw commented 2 weeks ago

On the subprocess stuff, I have observed that InferenceStateSubprocess.__del__ is called more than once for an instance with the same self._inference_state_id. I don't know if the id is a guarantee of the same object (especially since it's not the id of itself); can Python re-use ids within the same process if the previous instance has been garbage collected? Might that have changed in Python 3.13?

Looking in the docs for id:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

So maybe there was always scope for an issue here?

Either way, I think the use of ids to convey identity to the sub-process is the issue as they're not sufficiently unique with the current approach. I can't actually see anything amiss with the handling of the removals though - that seems pretty solid based on how I'd expect __del__ to work and this being essentially single-threaded. The only way I can see this happening is if __del__ for an instance is allowed to happen after the "replacement" instance has been created. Adding logging for that I do actually have evidence for that! This feels like it might actually be a Python bug at this point, since that doesn't seem like a good idea to me (allowing __del__ after another object has been __init__ with the same id and memory address).

Edit: for clarity: that's still looking at InferenceStateSubprocess.__init__ and InferenceStateSubprocess.__del__ (I slightly misread my logs), so not quite the smoking gun I was hoping for just yet.

PeterJCLaw commented 2 weeks ago

Aaah, aha, correcting myself then. I don't think this needs a Python bug.

I believe there's a race of the form:

Edit: have confirmed this from logging.

At this point the order of operations between the two InferenceStateSubprocess instances going away is immaterial -- both will trigger a removal of a state with id 1. As long as B' doesn't try to use the sub-process again after the first removal has happened then the second removal will fail.

I suspect that this could have happened under any Python version. My guess is that something in the GC behaviour on 3.13 is different and has highlighted the issue.

I think the solution is to create our own ids for the InferenceState instances. A global counter is probably fine I suspect, though it feels less neat than using the id. Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that __del__ is guaranteed to happen before the next __init__ of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

davidhalter commented 2 weeks ago

I think the solution is to create our own ids for the InferenceState instances. A global counter is probably fine I suspect, though it feels less neat than using the id. Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that del is guaranteed to happen before the next init of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

I'm probably fine with either solution, as long as it feels good to you. I generally am really happy that you were able to pin this down.

About the Path stuff: I can also try and install Python3.13 and see if I can reproduce it. I feel like once I can reproduce it, the fix will probably be quick. Whereas the whole InferenceState stuff is way way more complicated.

PeterJCLaw commented 2 weeks ago

About the Path stuff: I can also try and install Python3.13 and see if I can reproduce it. I feel like once I can reproduce it, the fix will probably be quick. Whereas the whole InferenceState stuff is way way more complicated.

For the Path stuff we actually hit the issue in both directions:

Either way around, I actually think it's fine that we're passing a Path here. (There are far more interesting types that are being passed here!) I agree there's definitely some unknown with pickling across python versions, though I don't think it's too bad. Typically my concerns with pickle would be security based, though here we're (already) treating both processes as essentially part of the same process/security boundary so I think it's fine on that front.

Do you feel we need to invest in a solution other than the unpickler?

PeterJCLaw commented 2 weeks ago

Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that __del__ is guaranteed to happen before the next __init__ of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

I've implemented this in https://github.com/PeterJCLaw/jedi/pull/3. Not sure how we want to arrange the branches -- it's not really related to the Python 3.13 upgrade, so I'm tempted to pull it out onto its own branch and then base this one (the upgrade) on that one.

davidhalter commented 2 weeks ago

Do you feel we need to invest in a solution other than the unpickler?

Not necessarily. I also don't think it's bad that we are passing Path. I just thought that it's weird but

I've implemented this in PeterJCLaw#3.

Thanks a lot! It looks good.

Not sure how we want to arrange the branches

I'm personally indifferent. :) Just do it however you feel like it works best for you.

PeterJCLaw commented 2 weeks ago

The latest set of failures here look related to CI now using beta 3 (up from beta 2) of Python 3.13. Based on the failures we're seeing, one of https://github.com/python/cpython/issues/121025 or https://github.com/python/cpython/issues/121027 seems possibly related. Don't have time this evening to dig further, but should do later in the week.

Have confirmed by running CI pinned to beta 2 (https://github.com/PeterJCLaw/jedi/commit/2cbe074ccb1c1f3db2d17307ea29a05d03d1995d) that the issue doesn't reproduce there, so this is definitely related to changes in beta 3.

PeterJCLaw commented 2 weeks ago

I've pinned down the test failures around partial under beta-3 to definitely being caused by https://github.com/python/cpython/issues/121027.

The specific issue which Jedi faces is that partial now looks like a method descriptor, so the check for that in DirectObjectAccess.py__name__ has a different result than it did before. This means we end up using the instance rather than its type to look up the __name__, which then raises an AttributeError.

I haven't dug too deeply into why a __get__ method is being added to functools.partial, though I'm not sure the details matter a great deal to us.

What I'm not sure about is what the right fix is here. We could detect that the thing we're looking at is a functools.partial and special case it in DirectObjectAccess.py__name__, but that doesn't really feel the right solution. @davidhalter do we have any examples of handling different names I could look at?

davidhalter commented 1 week ago

I finally managed to get Python3.13 onto my computer and it seems like I got an older beta and therefore cannot reproduce it.

Regardless I would say that the name in that signature is of a low priority anyway. It doesn't provide a lot of value. So I would even be fine with changing the assert and just check that a string is returned.

My assumptions from reading the code is that there is that _is_class_instance returns False and therefore the repr is returned. Can you maybe post the output for the following statements?

cls = partial(str, 1).__class__
cls == type
isinstance(cls, type)

Sorry I don't seem to be able to get a Python 3.13 shell quickly and I'm a bit lazy to set up all the docker stuff. I hope you don't mind.

PeterJCLaw commented 1 week ago

No worries, I had to build beta-3 myself.

>>> from functools import partial
>>> cls = partial(str, 1).__class__
>>> cls
<class 'functools.partial'>
>>> type == cls
False
>>> isinstance(cls, type)

For completeness, _is_class_instance returns True.

Something which did occur to me is whether we should react to ismethoddescriptor returning True by working through the py__get__ logic. That's probably the most technically correct thing here, though I'm not sure what other impacts that'd have nor if it's really what we'd even want to show (e.g: is that going to mess with properties).

In my mind there's a couple of cases to solve here -- the immediate one, where the descriptor actually returns itself, and also the future case where it'll return something else. In the latter I'm not actually sure what it'll return -- whether that's going to be another partial (but with a self argument) or whether we're going to get another type.

Another thing I'm slightly puzzled by is that testing the signature of partials manually with sith.py I'm not actually seeing a signature printed at all. Yet the tests clearly do better than that.

I appreciate the suggestion to make the test check for any string, though I'd really hope we can do that in the long term. I might do that as a short-term workaround though (for 3.13 specifically) and then defer this to being a known bug.

PeterJCLaw commented 1 week ago

Another thing I've just realised is that this is actually only an issue for the interpreter. When running as Script we don't have the issue with partial, since we don't have any of the mixed-mode behaviours and thus rely entirely on PartialObject etc. Ironically the comment in MixedObject.get_signatures implies that it expects the signature via inspect.signature to be better than code inspection. That also seems out of step with the codebase though, since inspect.signature isn't actually used anywhere that I can tell.

Aaanyway, given how limited the impact from this seems likely to be in light of this being interpreter only I've broken it out to https://github.com/davidhalter/jedi/issues/2012 and I'll update the test here to ignore 3.13+ for the name assertion.

davidhalter commented 1 week ago

Thanks for all your work here. I generally agree. It would of course be nicer if this didn't regress. But it doesn't really matter. There are probably things we can improve around this, but partial is not the name a user wants there anyway. A user would actually want foo(..., b, c) as a signature for a call like partial(foo, 1) where def foo(a, b, c): ....

This is why I don't really care about this bug. It wasn't really correct earlier. One could even argue that a signature like <functools.partial object at 0x7f2b26605850>(b, c) is more correct than partial(b, c), because the latter should really be something like partial(callable, *args, **kwargs).

Feel free to merge.

davidhalter commented 1 week ago

That also seems out of step with the codebase though, since inspect.signature isn't actually used anywhere that I can tell.

Did you see def _get_signature in jedi/inference/compiled/access.py?

PeterJCLaw commented 1 week ago

That also seems out of step with the codebase though, since inspect.signature isn't actually used anywhere that I can tell.

Did you see def _get_signature in jedi/inference/compiled/access.py?

Apparently not. I thought I searched for it, but seemingly missed that :shrug: