capi-workgroup / decisions

Discussion and voting on specific issues
5 stars 1 forks source link

Keep alias or not when replacing a private API with a public API? Action: add guidance in the devguide #14

Closed vstinner closed 6 months ago

vstinner commented 7 months ago

Spin-off of the "broad" issue: Should we make it hard for 3rd parties to use private functions?.

When a private API is made public, same API but remove "_" prefix (and maybe change the name), should we keep the old name as an alias to the new API, or remove the private API?

Example: Python 3.13 alpha3 upgraded the private _PyCFunctionFast type to a new public PyCFunctionFast type. It was decided to keep the old name.

We discuss this topic a few times, but it was not clearly recorded as a vote, and there is no written decision. This time, I would like to add guidance to the Devguide, to no longer have to discuss this specific question: https://devguide.python.org/c-api/

Well, if "IT DEPENDS" wins, we will still have to discuss each API. But it seems like the majority prefers "ALWAYS KEEP" ;-) Let's verify that!

You can vote for 1 or 2 choices, but it might help to take a decision if you don't vote for the 3 choices :-)

Vote: ALWAYS KEEP the private name

Vote: ALWAYS REMOVE the private name

Vote: IT DEPENDS, discuss it on a case by case basis

vstinner commented 7 months ago

Vote: IT DEPENDS, discuss it on a case by case basis

I propose a sub-title: by default, keep the old name, but discuss if it should be removed or not when an API is discussed. Otherwise, this "decision" will not help :-(

iritkatriel commented 7 months ago

I think we should keep it until all supported versions have the public function (but not forever).

gvanrossum commented 7 months ago

I think we should keep it until all supported versions have the public function (but not forever).

That gets my vote.

I voted “it depends” because surely there will be exceptions to the general rule.

encukou commented 7 months ago

There'll always be exceptions to any guideline, so I read “always” as “nearly always”.

“It depends” is technically OK for me -- I'd “always” vote yes. But I'd prefer to only discuss the truly exceptional cases.


I think we should keep it until all supported versions have the public function (but not forever).

Or until the public function itself is removed.

vstinner commented 7 months ago

There'll always be exceptions to any guideline, so I read “always” as “nearly always”.

Correct. It's like PEP 387 which gives a generic guidelines, and then exceptions can be asked.

I think we should keep it until all supported versions have the public function (but not forever).

I have a plan for https://github.com/capi-workgroup/api-evolution/issues/24 similar to @encukou's plan.

My draft PEP: https://github.com/vstinner/misc/blob/main/cpython/pep-c-api-compat-version.rst

In short, my plan is to:

Legacy stuff are kept longer: no more eagger "clean up all the legacy stuff" trend.

"Earlier-adopters" devs can opt in to decide WHEN to clean up legacy stuff. The WHEN does no longer have to be "when a new Python version is released" but can be earlier or later. It gives more control to Python core devs and 3rd party devs on how to deal with "technical debt". The pace is no longer enforced by Python core devs on 3rd party devs.

Note: I didn't publish this draft since I'm busy with PEP 737 (Unify Type Formatting) waiting for the SC and PEP 741 (stable ABI to configure Python init) which is being discussed. I will try to propose a concrete PEP once a decision is taken on one of these PEPs. I cannot handle too many PEPs in parallel :-)

gvanrossum commented 7 months ago

It looks like Petr, Victor, Irit and myself have voted, and "ALWAYS REMOVE" loses with only one vote out of 4. But "IT DEPENDS" and "ALWAYS KEEP" each have 3 votes out of 4. @zooba could you vote (and/or explain your position further)? I'm fine with you breaking a tie. I'm also fine if you vote for both -- in that case I believe "IT DEPENDS" would be the fair winner though (being the more flexible option, but also the most labor-intensive for the WG).

I like the idea of having additional feature selection macros that users can set to choose which legacy APIs are visible and which produce deprecation warnings if used, but I'd like to make that a separate issue -- I see that as a refinement of the policy we're voting on here, and it can refine either option. I expect that Victor and Petr are pretty close and can come up with a joint proposal quickly. (One nit: I find the 0x030f0000 notation error-prone. Maybe we can create a macro Py_VERSION_TO_HEX(3, 15) which constructs the appropriate value using shifts, which should work in the preprocessor.)

vstinner commented 7 months ago

According to past discussions, "keep old (private) API by default" wins the majority vote. And we need a process to get rid of legacy private API different than "just remove it and cross fingers". I tried that on multiple Python versions, and it always took way too long to get most C extensions ready for these incompatible changes. Too long means 6 to 10 months, whereas I need C extensions to work at alpha 1 ("0 days"), or alpha 2 (1 month) at least.

Overall, we need a way to:

Currently, Step 1 and Step 2 happen exactly at the same time: at the first alpha release of a new Python version, such as Python 3.13 alpha 1. It forces all C extensions to migrate at day 1, or even worse, have to test the main development branch which is unstable before alpha 1 release ("Continuous Integration" which can be expensive for a new Python version where APIs change often and quickly).

What I discussed in my previous comment is to have an infrastruction/tooling to have different time span for Step 1 and Step 2. Example:

It would give 2 years, instead of 0 days, to C extensions maintainers to update their code.

The other problem, often reported by my colleagues updating Python in Fedora: not all incompatible changes land in alpha 1, many come along the way from alpha 1 to beta 1. It means that code must not be updated once, but "many times". It's a pain to update code. Well, here we are already talking about the C API. Python changes are out of the scope of this discussion.

iritkatriel commented 7 months ago

It looks like Petr, Victor, Irit and myself have voted, and "ALWAYS REMOVE" loses with only one vote out of 4. But "IT DEPENDS" and "ALWAYS KEEP" each have 3 votes out of 4.

"It depends" is a non-decision. If we agree that the other two are not right, we need to come up with the actual guidelines.

gvanrossum commented 7 months ago

@vstinner

According to past discussions, "keep old (private) API by default" wins the majority vote.

I don't understand how you came to this conclusion. It has three votes, and so does "it depends". That looks like a tie to me. When did we decide that "keep old" had priority? I thought that was only a temporary measure to stop removing old API while we were debating.

And we need a process to get rid of legacy private API different than "just remove it and cross fingers".

That assumes we need to get rid of legacy private API. :-) At this point in many cases, the tactic "leave sleeping dogs sleep" seems to be less work and have less risk than the proposed processes.

The one exception is legacy API that gets in the way of progress (notably, if it exposes internal struct layout that we actively want to change but can't because of the legacy API; or if it is not thread-safe). In those cases, and those cases only, do I agree that we must do something.

I tried that on multiple Python versions, and it always took way too long to get most C extensions ready for these incompatible changes. Too long means 6 to 10 months, whereas I need C extensions to work at alpha 1 ("0 days"), or alpha 2 (1 month) at least.

Where is it written that all API changes must be done by alpha 1? I've always assumed that the deadline was beta 1. Alpha one gives an unreasonably short deadline for introducing new APIs.

Overall, we need a way to:

* Step 1: Introduce a "new" opt-in API replacing an "old" API.

And by "opt-in" you just mean "you may start using it but you don't have to" right? I'd rather not require feature selection macros here.

* Step 2: Make most C extensions compatible with these changes.

And by this you mean "make them use the new API" right? Because they are still compatible with the old API which isn't removed.

* Step 3: When we consider that the ecosystem migrated to the new API, remove the old API.

Yup.

Currently, Step 1 and Step 2 happen exactly at the same time: at the first alpha release of a new Python version, such as Python 3.13 alpha 1. It forces all C extensions to migrate at day 1, or even worse, have to test the main development branch which is unstable before alpha 1 release ("Continuous Integration" which can be expensive for a new Python version where APIs change often and quickly).

That is a surprising conclusion. IIUC most extensions aren't forced to migrate until the old API is removed. Also, the same thing about alpha 1.

Or are you describing how things turn out when you remove the old API right away? That would be doing steps 1 and 3 at the same time, forcing step 2 also to happen at this time. That's why nobody else liked this. :-)

What I discussed in my previous comment is to have an infrastruction/tooling to have different time span for Step 1 and Step 2. Example:

* Step 1: at first alpha of a new Python 3.x.

Or before first beta, at least.

* Step 2: whenever C extensions maintainers want.

We might apply some light social pressure, from submitting PRs to a "who has (not) migrated" website like we did for Python 2-to-3.

* Step 3: remove old API at Python 3.(x+2).

Or later. This is subject to careful consideration. Another policy would be to remove it only after no supported version is lacking the new replacement API. That's probably about 5 years (not sure how to interpret the diagram at https://devguide.python.org/versions/).

It would give 2 years, instead of 0 days, to C extensions maintainers to update their code.

Or 5 years, give or take.

The other problem, often reported by my colleagues updating Python in Fedora: not all incompatible changes land in alpha 1, many come along the way from alpha 1 to beta 1. It means that code must not be updated once, but "many times". It's a pain to update code. Well, here we are already talking about the C API. Python changes are out of the scope of this discussion.

This may be due to a misunderstanding about when API changes are allowed, see above.

"It's a pain to update code" -- that's why you are being paid to do this. :-)

gvanrossum commented 7 months ago

"It depends" is a non-decision. If we agree that the other two are not right, we need to come up with the actual guidelines.

The default is "we consider each case as a separate C API WG decision". I'm sure we'll get tired of that quickly, but we will have learned things that can go into the guidance. If we try to write strict guidance now there's a chance we'll get it wrong.

Anyway, I'm also happy with "never remove old API", but I'm really waiting for a peep from @zooba.

vstinner commented 7 months ago

Hum, I'm waiting for @zooba vote, I didn't make any conclusion. I mostly discussed how changes were conducted so far, and how things can be done differently.

@gvanrossum:

I don't understand how you came to this conclusion. It has three votes, and so does "it depends". That looks like a tie to me.

Sorry, I was referring to past decisions, not about this vote.

That assumes we need to get rid of legacy private API. :-)

Right, in the comments about how to migrate, I'm making the assumption that there is a willingness to get rid of it. Some people might prefer to use the "new" API even if the "old" API remains supported. I'm discussing a method to help these developers.

Where is it written that all API changes must be done by alpha 1? I've always assumed that the deadline was beta 1. Alpha one gives an unreasonably short deadline for introducing new APIs.

I'm discussing the delay between when an incompatible change is introduce and when C extensions are updated for that, making the assumption here that the old ways is removed immediately. I was discussing how things were done so far.

And by "opt-in" you just mean "you may start using it but you don't have to" right? I'd rather not require feature selection macros here.

Hum, I'm thinking aloud. I'm not sure how it should be exposed to the user.

In fact, I was thinking more about having the new and old API usable by default, but provide an option (macro) to remove the old API. So you build your C extension "in strict mode" and get build errors at every usage of the old API. But by default the macro is defined and the C extension builds fine.

And by this you mean "make them use the new API" right?

Right. Step 2 is about getting most C extensions compatible with the "strict mode" and the Step 3, when the old API is removed.

"It's a pain to update code" -- that's why you are being paid to do this. :-)

There is a "social bottleneck": even if my team proposes changes "upstream", affected projects have a different agenda than "Python C API Agenda": they prefer to only ship a new version when at least Python beta1 is released, and usually they also want to finish their own roadmap unrelated to C API changes before making a new release.

In the meanwhile, it's common to not be able to use Cython and numpy for months, whereas it's a key dependency of a large number of Python projects.

vstinner commented 7 months ago

@iritkatriel:

"It depends" is a non-decision. If we agree that the other two are not right, we need to come up with the actual guidelines.

I tried to propose alternative defintion to "it depends" and discuss the vote. But yeah, we might have to vote again if the result doesn't bring a very clear guidance for new APIs replacing old APIs.

vstinner commented 7 months ago

@vstinner:

It would give 2 years, instead of 0 days, to C extensions maintainers to update their code.

@gvanrossum:

Or 5 years, give or take.

It's tricky to make a decision for the general case.

Let's take an example. Python 3.13 adds PyDict_GetItemRef(), PyList_GetItemRef() and PyWeakref_GetRef(). Using these new "replacement" functions reduces the risk of crashes caused by borrowed references. They are important to use with the Free Threading build. Should we "enforce" these new functions right now? I don't think so. Upgrading all C extensions to use them will take time, even if we had tooling to automate the migration.

We have to support PyDict_GetItemRef(), PyList_GetItemRef() and PyWeakref_GetRef() for a few more years.

I deprecated PyWeakref_GetObject() with a scheduled removal in Python 3.15. If we have a method to opt-in for an API without borrowed references, there would be "less pressure" to "enforce" a migration to strong references, and we can maybe remove the deprecation to keep the old API "longer" (how long?).

I know how hard it is to advertize "safer", "cleaner" and "better" API when benefits are not obvious to developers who answer "my code works fine, why should I care?".

gvanrossum commented 7 months ago

I think we may be converging. Maybe there are several categories of changes to consider.

I don't know if there are other cases?

I still would like to have a separate issue (in this tracker or elsewhere, maybe even on Discourse) about introducing some kind of "strict" mode (@vstinner and @encukou both have fairly similar proposals already).

encukou commented 7 months ago

Note that this vote is very specifically about “Case 1”:

When a private API is made public, same API but remove "_" prefix (and maybe change the name), should we keep the old name as an alias to the new API, or remove the private API?

(emphasis mine)

But the discussion here is now wider. For Case 2:

Supporting the old name is required according to (some folks' interpretation of) the backward compatibility guidelines

That's not the case: PEP-387 quite explicitly says that names prefixed by _ are not public API, and so PEP-387 does not apply to them. I don't know of a PEP that does apply ­– just unwritten rules like “in doubt, status quo wins” or “don't break users”.


If users' code works with PyDict_GetItem or PyWeakref_GET_OBJECT -- perhaps because they single-threaded, have locks, or ensure a reference exists elsewhere -- there really is no reason for them to change. Forcing them to change would be bullying.

OTOH, there are users that want to follow best practices as soon as possible. Let's make things easy for them: announce that we're not fond of a particular API, ideally in a “machine-readable” way. I don't think we can do much about C linters, but we can add deprecation warnings or that “guaranteed squeaky clean (as of 3.13)” macro.

vstinner commented 7 months ago

If users' code works with PyDict_GetItem or PyWeakref_GET_OBJECT -- perhaps because they single-threaded, have locks, or ensure a reference exists elsewhere -- there really is no reason for them to change. Forcing them to change would be bullying.

You can get a crash with a single thread and reentrant code when using borrowed references. Example of a crash when using a borrowed reference to a type, such as Py_TYPE(obj): https://peps.python.org/pep-0737/#use-t-format-with-py-type-pass-a-type

gvanrossum commented 7 months ago

Looks like this issue is stuck in tit-for-tat land. :-(

zooba commented 7 months ago

Sorry, but I have to vote for "it depends", despite it being a "non-decision". If you really want to force my hand, then I'll vote for "always keep", even though I don't think we should always keep.

My primary motivation is that I think we should almost always change an API (and should always discuss changing it) when taking it from private to public. Private APIs were not thoroughly discussed or designed at the time they were added, and usually weren't added with any intention to become public, so they shouldn't get to bypass that process simply by being private first.

And following the assumption that we're going to make the public API more intentional than the private API, I think we ought to keep the original around in its (near) unmodified form for at least the release that has the new API. It would still be private, and our own uses would still get the benefit of that.

But in the rare case where a private API is perfectly designed already and we really do get the best public API with a simple rename, then there really isn't a need to keep the original around. But without looking at each case, my assumption would be that we should not simply rename an API to make it public, and so the change in behaviour will justify different public and private APIs.

vstinner commented 7 months ago

I consider writing down a PEP to write down the rationale of the different issues listed here and propose a macro to opt-in for a strict mode. Apparently, the rationale is quite big and there are multiple use cases and constraints to consider. It's hard to summarize and have a single solution to all issues.

gvanrossum commented 7 months ago

Looks like we have a solid 4/5 majority for "it depends", plus a bunch of at-least-half-baked ideas for helping users navigate this swamp. I still recommend opening a separate issue to refine those ideas before we draft the inevitable PEP.

One fine point: @zooba writes

My primary motivation is that I think we should almost always change an API (and should always discuss changing it) when taking it from private to public.

I totally agree, but I read the requested vote as applying to the narrow case where we've already had that discussion and decided that a simple rename is all that's needed, without any interface changes.

I also have a slightly less pessimistic view on the likelihood that private APIs happen to be designed right, but I agree that we should always consider the right API before deciding to simply rename. I am not worried that we might forget this discussion (witness the long thread about replacing _Py_HashDouble()).

encukou commented 7 months ago

Well, I added a vote for “IT DEPENDS”. We're formally unanimous; I'll close the issue.

Whenever we do decide to expose private/internal API with only a rename, without changing the behaviour, feel free to pre-fill my vote box for “keep the old name” :)

gvanrossum commented 6 months ago

Where is the guidance in the devguide?

vstinner commented 6 months ago

Well, I added a vote for “IT DEPENDS”. We're formally unanimous; I'll close the issue.

I don't understand exactly what it does imply. Does it mean that the C API Working Group must be formally asked to decide when private functions are removed? I suppose, only when a function becomes public, and the old private name is removed. Am I correct?

@iritkatriel:

I think we should keep it until all supported versions have the public function (but not forever).

About scheduling C API changes "in the future", I just wrote PEP 743 "Add Py_COMPAT_API_VERSION to the Python C API" to propose a way for C API consumers to test if their code is ready for future "planned" C API changes. Join the discussion at: https://discuss.python.org/t/pep-743-add-py-compat-api-version-to-the-python-c-api/48243

If such Py_COMPAT_API_VERSION macro is accepted, it may become easier to decide when and how the legacy (private in this case) API is deprecated and then removed, or kept forever.

encukou commented 6 months ago

I don't understand exactly what it does imply.

I read it as: When a private API is made public (same API but removed "_" prefix) -- which is presumably a decision that needs WG decision -- we also need to decide what happens to the old name.

But... you added this option when you started the vote. What do you think it implies?

vstinner commented 6 months ago

My plan was to add a guideline if the decision is "always remove" or "always keep", but I'm not sure that it's useful to document "it depends". Or at least, I don't know how to explain it in the devguide.

zooba commented 6 months ago

Maybe something like:

When considering to rename a private API (_Py_...) to a public one, we do not have an automatic decision on whether to retain the private name for compatibility purposes. The likely impact of removing the old name should be considered when deciding, and the C API Working Group is available to assist with making this decision.

Note that this does not apply when the new API has different definition or semantics. In this case, the old one should be retained initially and removed when it becomes unsustainable or unused internally, following the same process as if there were not a new public API added.

I don't know exactly where it would go in the devguide, but if you can find a spot, that text at least sums up my position.