encukou / abi3

Improvements of Python's stable ABI
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Isolating modules in the stdlib #21

Open encukou opened 2 years ago

encukou commented 2 years ago

PEP 630 has motivations and technical notes. What needs to be documented better is how both applies to stdlib. Specifically:

Whether that should be a new PEP or added to 630 doesn't matter much, IMO.

Then, any introduction of PEP 630 should be done deliberately, analyzing the pros and cons for each module separately. It should not be a mass change, and it should involve explaining the motivations & specific implications to module maintainers.

erlend-aasland commented 2 years ago

Thanks, Petr!

Relevant bpo's (I'll expand the list later):

Related PEP's:

erlend-aasland commented 2 years ago
  • list the behavior changes when static types are converted to heap types (mutability, pickleability, any more?), and document how to undo them when the new behavior is not better. Same for any other behavior changes related to changing existing code.

Expanding this list a little bit (mostly as a note to self):

encukou commented 2 years ago
erlend-aasland commented 2 years ago

Here's a draft of a ToC:

Isolating modules in the standard library

Subtitle: Applying PEP 630 to the standard library

encukou commented 2 years ago

Applying PEP 630 to the standard library

Please change to something like "Isolating modules in the standard library". Most people don't have PEP numbers memorized; having to look up part of a document name sucks :) Same for "PEP 573"→"module state access"

recap user benefits

Limited API & PySide aren't really relevant for the stdlib. If you're going for a new document, keep the rationale/motivation limited to stdlib itself. The reasons should be e.g. "Py_Finalize() doesn't clear all Python objects at exit" (bpo-1635741).

Modules/xxmodule.c must be ported sooner or later

It's already ported, see Modules/xxlimited.c. xxmodule should stay as example of the original API, as long as that's supported. But as a tutorial that can be followed any time, it could work well.

erlend-aasland commented 2 years ago

Please change to something like "Isolating modules in the standard library". Most people don't have PEP numbers memorized; having to look up part of a document name sucks :) Same for "PEP 573"→"module state access"

Good points :) I've updated the post.

If you're going for a new document [...]

I think it will work better as a separate document. It'll be too comprehensive to add to PEP 630. Two relatively short documents are better than one large document, IMO :)

It's already ported, see Modules/xxlimited.c.

Ah, thanks!

shihai1991 commented 2 years ago

Checked some global static vars:

erlend-aasland commented 2 years ago

I finally managed to allocate some time for this. I've gone back and forth a bit regarding the format and structure. So far, I think a short, informal, standalone document is fine for this. A draft is embedded in this post, but a PR is easier to review and collaborate with, so I figure I open a PR on this repo :)

Draft: Isolating Modules in the Standard Library # Isolating modules in the standard library This document serves as a checklist for how to isolate a module in the standard library. As of Python 3.11, a lot of modules have been adapted to multi-phase initialisation, and a lot of standard library types have been converted from static to heap types. This process has not been straight-forward, and most of the issues relate to how heap types differ from static types. Some performance issues have also been encountered when converting global state to module state. Before undertaking this, you should familiarise yourself with the following PEPs: - [PEP 384](https://www.python.org/dev/peps/pep-0384/) - [PEP 489](https://www.python.org/dev/peps/pep-0489/) - [PEP 573](https://www.python.org/dev/peps/pep-0573/) - [PEP 630](https://www.python.org/dev/peps/pep-0630/) ## Part 1: Preparation 1. Open a discussion, either on the bug tracker or on Discourse. Involve the module maintainer and/or code owner. Explain the reason and rationale for the changes. 2. Identify global state performance bottlenecks, if there are such. Create a proof-of-concept implementation and measure the performance impact. `pyperf` is a good tool for benchmarking. 3. Create an implementation plan. For small modules with few types, a single PR may do the job. For larger modules with lots of types, and possibly also external library callbacks, multiple PR's will be needed. ## Part 2: Implementation Note: this is a suggested implementation plan, based on lessons learned with other modules. 1. Add Argument Clinic where possible; it enables you to easily use the defining class to fetch module state from type methods. 2. Prepare for module state; establish a module state `struct`, add an instance as a static global variable, and create helper stubs for fetching the module state. 3. Add relevant global variables to the module state `struct` and modify code that accesses the global state to use the module state helpers instead. This step may be broken into several PR's. 4. Convert heap types to static types. 5. Convert the global module state struct to true module state. 6. Implement multi-phase initialisation. Preferably, steps 4 through 6 should all land early in a single alpha development phase. ## Got'chas - All heap types **must fully implement the GC protocol**. See [bpo-42972](https://bugs.python.org/issue42972). - All standard library types should remain immutable. Heap types are mutable by default, static types are not. Use `Py_TPFLAGS_IMMUTABLE_TYPE` to retain immutability. See [bpo-43908](https://bugs.python.org/issue43908). - Static type with tp_new = NULL does not have public constructor, but heap type inherits constructor from base class. Make sure types that previously were impossible to instantiate, retain that feature; use `Py_TPFLAGS_DISALLOW_INSTANTIATION`. Add tests using `test.support.check_disallow_instantiation()`. See [bpo-43916](https://bugs.python.org/issue43916). - Use strong "back-refs" to the module object to ensure the module state pointer never outlives objects that access module state. Keep this in mind for external library callbacks that access module state.
encukou commented 2 years ago

Hi! Sorry about the delay, on my part this time. My life should be back to normal now :)

The draft looks OK. Perhaps it the goal should be to put it in the devguide, rather than a stand-alone document.

I think one thing missing from the guide is a more explicit note that not all types need to become heap types. If a type doesn't need the module state, it can be kept static. This might change depending on how/if GIL removal works, but it's too early in that project to know what will be needed.

erlend-aasland commented 2 years ago

The draft looks OK. Perhaps it the goal should be to put it in the devguide, rather than a stand-alone document.

The devguide sounds like a good place for such a document. That is, unless the SC thinks this calls for a stand-alone PEP, of course.

I think one thing missing from the guide is a more explicit note that not all types need to become heap types. [...] This might change depending on how/if GIL removal works, but it's too early in that project to know what will be needed.

So, we should note that as far as we know per now, not all types need become heap types, but that may change in the future.

If a type doesn't need the module state, it can be kept static.

Would that work correctly with multi-phase init?

encukou commented 2 years ago

So, we should note that as far as we know per now, not all types need become heap types, but that may change in the future.

It might change, but it also might not -- and not doing unnecessary changes helps limit the possible issues.

Would that work correctly with multi-phase init?

It should! If it doesn't, that's a pretty serious bug.

erlend-aasland commented 2 years ago

Ok, I'll update this in a couple of days hopefully. I'll update the PR on this repo. If the SC says an update to the devguide is enough, I'll create an issue and PR in that repo. For now, let's keep it here.

encukou commented 2 years ago

IMO, a PEP is in order. We need a PEP-like document, and we probably need SC approval, so let's go all the way. I'll take the existing document from the PR and work on it on it this week.

erlend-aasland commented 2 years ago

Great news!

encukou commented 2 years ago

PEP PR: https://github.com/python/peps/pull/2499