capi-workgroup / problems

Discussions about problems with the current C Api
19 stars 6 forks source link

Naming convention for new C API functions #52

Closed vstinner closed 3 months ago

vstinner commented 1 year ago

Are you ok with the naming convention of adding Ref suffix when adding a variant of a function which returns a strong reference instead of a borrowed reference? Do we need to go further in terms of standard?

See also issue capi-workgroup/problems#14: "PyLong and PyUnicode don't match the Python type names".

Point raised by @markshannon in my PR adding PyDict_GetItemRef().

(*) I would like to fix the C API to avoid functions returning borrewed references. For that, I add new variants of existing functions which return strong references, by adding a Ref suffix to their name:

In the past, other naming convention were used (other suffixes were added).

() To add support for Unicode filenames in the import machinery, I add variants of functions taking ``charby addingObjectsuffix: function takingPyObject*`` argument. Examples:

(*) Previously, another trend was simply to add Ex suffix. Windows API has a similar pattern: WaitForMultipleObjects() => WaitForMultipleObjectsEx(). Examples in Python:

(*) A fourth trend (!) is to use WithSomething suffix. Examples:

(*) I also saw another suffix, I don't know if it can be called a convention, a variant of the previous name: add Flags.

So I saw the following conventions to add suffixes (oldest to the most recent):

The funny part is that some added functions get multiple suffixes like PyRun_AnyFileExFlags() (Ex + Flags).

markshannon commented 1 year ago

Given this for new API functions why not have a new prefix to clearly distinguish new API functions from old?

E.g.

PyAPI_DictGetItem()
PyCAPI_DictGetItem()
Python_DictGetItem()
PyStable_DictGetItem() /* To distinguish from the unstable API */
vstinner commented 1 year ago

A side question of this issue is: How to shrink the size of the C API? See the issue capi-workgroup/problems#53 for that.

vstinner commented 1 year ago

Given this for new API functions why not have a new prefix to clearly distinguish new API functions from old?

Over time, some functions accumulated many variants (versions):

Total: 5 flavors of PyRun_AnyFile(). All of them are still available in Python 3.13!

Adding a prefix or a suffix doesn't make a big difference: at the end, we will always need to add a new version in future to fix yet another issue. Someone will come up with a new use case which requires to change an existing API which means adding yet another flavor.

So do we prefer accumulating suffixes, prefixes, or both? So far, the trend was to always keep the name short Py prefix (PyXXX or Py_XXX names) and only add suffixes.

markshannon commented 1 year ago

PyRun_XXX is one case where we do want a struct in the API. Structs can be used as input to general, extensible functions, allowing named arguments in C. E.g.

struct PythonRunConfig {
    int version; /* Necessary, so the API function doesn't read off the end */
    FILE *fp;
    const char *filename; 
    int closeit,
    PyCompilerFlags *flags; 
    /* Extra fields go here */
};

PyAPI_FUNC(int) Python_Run(const PythonRunConfig *config);

Used as follows:

PythonRunConfig config =
(PythonRunConfig) {
    .fp = the_file,
    .closeit = 1
};

int err = Python_Run(&config);
vstinner commented 1 year ago

IMO changing the suffix should be a one time event and the whole legacy API would still be supported forever anyway: https://github.com/capi-workgroup/problems/issues/54 Changing the suffix is an opportunity to reuse existing names for the "sane" behavior.

Even if the suffix is changed once, we will still have to add new variants tomorrow. Do you have to agree on a suffix?

vstinner commented 1 year ago

@markshannon @erlend-aasland: are you ok with the Ref suffix convention for new functions returning a strong reference rather than a borrowed reference?

cc @serhiy-storchaka

vstinner commented 1 year ago

PyRun_XXX is one case where we do want a struct in the API.

The struct API sounds appealing. But that would be a 6th version of the same function and so a new name should be found: completely different or add yet another suffix.

erlend-aasland commented 1 year ago

Data point: SQLite simply use a version suffix, increasing the version number for each API revision. Example:

erlend-aasland commented 1 year ago

@markshannon @erlend-aasland: are you ok with the Ref suffix convention for new functions returning a strong reference rather than a borrowed reference?

cc @serhiy-storchaka

What if we have to amend one of the *Ref() functions say in five years time from now. What do we call the new, amended API? *RefEx()? :)

I kind of like the version convention of SQLite; you never know how many variants of an API you'll need, and naming them similar only creates more confusion (in my opinion). For example PySomething_Fetch, PySomething_Get vs. PySomething_GetRef. It is not obvious which one is the newest? If they are named PySomething_Get, PySomething_Get_v2, and PySomething_Get_v3 it would be obvious which one is newest.¨

If _GetItem is the perfect name for an API that need amending, why not simply attach _v2 to it for the amended variant?

erlend-aasland commented 1 year ago

I think this issue is drifting to a discussion about possible solutions; suggesting to create an issue in the devguide, and continue discussing solutions (that is: guidelines) over there.

erlend-aasland commented 1 year ago

I created python/devguide#1126 for discussion solutions.

vstinner commented 1 year ago

What if we have to amend one of the Ref() functions say in five years time from now. What do we call the new, amended API? RefEx()? :)

My experience says that yes, for sure, we will have to add a variant of at least one Ref function.

In that case, I suggest coming with a new suffix and replace Ref with that suffix.

vstinner commented 1 year ago

naming them similar only creates more confusion (in my opinion). For example PySomething_Fetch, PySomething_Get vs. PySomething_GetRef. It is not obvious which one is the newest?

In think that Windows not only has Ex suffix but also Ex2. Python naming convention has more diversity in suffixes so it's harder to know which one is the "latest". One issue is that all variants are kept. Usually it's because the new API has a new feature only useful to a minority of users, so most users are fine with the existing API and there is no strong willingness to deprecate/remove old APIs and force a migration to the new ones.

The Ref variant is different since it fix the API which was "error prone". The Ref is safer. But. Borrow references are fine in most cases when you use CPython. Even PyPy is fine to return a borrow reference to an item of a container: the pointer remains valid until the container is destroyed. Still, I'm not very motivated to enforce a major migration from PyDict_GetItem() to PyDict_GetItemRef() just to reduce the risk of an unlikely issue.

vstinner commented 1 year ago

If they are named PySomething_Get, PySomething_Get_v2, and PySomething_Get_v3 it would be obvious which one is newest.

An alternative is to just add a number: PyDict_GetItem(), PyDict_GetItem2(), PyDict_GetItem3(), ...

I dislike mixing SnakeCase with lowercase "v2".

vstinner commented 1 year ago

The Linux kernel just adds a number suffix to syscalls:

A competitor calling convention exists for Linux syscalls: number of arguments:

There are also 64 bits variants like getdents64().

Linux syscalls: https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md

erlend-aasland commented 1 year ago

Yeah, just sticking a number to it is not clear enough, IMO. As you say, there are precedent for using number to imply the number of params, and stuff like 64-bit variants. If you add a v or V, there is a stronger indicator this is a new version of a specific API :)

(Perhaps we should take this discussion to the "solutions" repo (python/devguide)

vstinner commented 1 year ago

By the way in my list of Python C API suffix, I forgot the number suffix. Some examples:

vstinner commented 1 year ago

Ok, can we now agree on the versioning of C API calling conventions? 😂

erlend-aasland commented 1 year ago

Given this for new API functions why not have a new prefix to clearly distinguish new API functions from old?

It ends up being a variant of the same dilemma; if you need to amend the new API, you'll have to introduce yet another prefix.

markshannon commented 1 year ago

I get the need for version numbers, but given we are overhauling the whole API, wouldn't it make sense to distinguish the new, more principled API from the old, ad-hoc one? Hopefully we won't need too many _v2 functions.

vstinner commented 1 year ago

I think that we are talking past each others here. I see two questions:

I prefer to keep Py prefix for now. In short, I propose adding PyDict_GetItem2() which replaces PyDict_GetItem().

When tomorrow, a new API will be designed, PyDict_GetItem2() will become PyCAPI_PyDict_GetItem() (no suffix): replacing removed PyDict_GetItem() function.

In such new API, we will still have to add new functions, we can use the same naming convention (being discussed here).


Let's say that all new C API functions must now use a new PyCAPI_ prefix. Slowly, we will have a mixture of Py_ and PyCAPI_ functions in C extensions. For me, changing the prefix sounds "let's write a new API from scratch and make it correct this time" (such as HPy). But that's not what being discussed here. The question here is how to name new functions for an incremental change of the existing C API.

For a brand new C API, IMO it's more the domain of the issue capi-workgroup/problems#54 and I consider that they are unrelated.

If the prefix is changed, I would prefer that the whole C API is provided with this prefix. For example, I would feel bad to have to call PyLong_AsLong() after PyAPI_Dict_GetItem(). I would prefer to have PyCAPI_Long_AsLong() included.

The problem is that someone has to go through the 1500+ functions of the C API, check if API follows the new guideline, and if yes, introduce an alias:

#define PyCAPI_Long_AsLong PyLong_AsLong

hint: PyLong_AsLong() doesn't respect the guideline, -1 is ambigious :-)

vstinner commented 1 year ago

Introduce a new prefix to have a brand new API which follows the new guidelines and so only prefer "sane" APIs (unambigious return value, correct error reporting, no borrowed references, etc.).

To clarify the discussion, I created issue capi-workgroup/api-revolution#9 to discuss this idea.

gpshead commented 1 year ago

fyi - pep-703 currently spells the _GetItem replacement as _FetchItem for a non-borrowing API.

The Linux kernel just adds a number suffix to syscalls:

ISTR that in many cases the trailing number was also the number of expected arguments or a similar calling convention notation. But that may have been more by chance (each new API naturally added one?) rather than design. This OS pattern started long before Linux. dup -> dup2 comes from BSD.

Regardless, with no clear thing to pick here and nobody wanting to think their "this time it's perfect" API is going to wind up revised in the coming decades... I don't think it matters which naming scheme we pick or even if it winds up wholly consistent. The more important aspect is documentation making it clear which APIs are ideal, old-ideal / reluctantly-supported, and please-never-use-this-legacy status. With the second most important aspect the ability for someone to request only a certain modern version of the API be available via their #include. (similar to how Py_LIMITED_API works?) so their code can't give into wrong API temptations.

vstinner commented 1 year ago

@gpshead:

I don't think it matters which naming scheme we pick or even if it winds up wholly consistent.

Ok, but so far, nobody approved my PR https://github.com/python/cpython/pull/106005 and @markshannon asked to get a consensus on naming convention before adding the function. So, are you ok with PyDict_GetItemRef() name? What about the others?

vstinner commented 1 year ago

The more important aspect is documentation making it clear which APIs are ideal, old-ideal / reluctantly-supported, and please-never-use-this-legacy status.

I created https://github.com/python/peps/pull/3182 proposing to formalize the concept of "soft deprecation". It may help to make old functions kept for backward compatibility without the willingness to schedule their removals, but they should not be used for new code.

With the second most important aspect the ability for someone to request only a certain modern version of the API be available via their #include. (similar to how Py_LIMITED_API works?) so their code can't give into wrong API temptations.

I proposed issue capi-workgroup/problems#54 for that. See also the more radical issue capi-workgroup/api-revolution#9 idea.

markshannon commented 1 year ago

@gpshead I agree that an API we come up with isn't going to perfect, but it will be a lot better than what we have now. A new API that learns from our experience and is designed carefully will be a lot more consistent, usable and stable than the current ad-hoc API.

Sure, we need to allow for further changes, but version 2s should be rare. The C API should reflect the semantics of the language and features of the VM. While these get added to, they change very rarely.

For example, the semantic of getting an item from a dictionary haven't changed, it is just that we are fixing flaws in the current API which doesn't handle errors properly. We have learned that just because a function can't currently fail, we should allow for failure in the API, with a few exceptions: (e.g. PyLong_Check() cannot fail).

Having a clear marker (I prefer a new prefix) in the API that differentiates the old from the new, makes it easy to see which API functions may need updating.

vstinner commented 1 year ago

@iritkatriel added _PyErr_ChainExceptions1(exc) to Python 3.12: function replacing _PyErr_ChainExceptions(exc_type, exc_value, exc_tb). Is the name related to the number of parameters: 3 => 1? Should it be renamed to `_PyErr_ChainExceptions2() before 3.12 final is released?

vstinner commented 1 year ago

fyi - pep-703 currently spells the _GetItem replacement as _FetchItem for a non-borrowing API.

I'm not a fan of changing a function name when fixing an issue to avoid adding a suffix. IMO it makes the migration from existing API to the new one more complicated. For me, it's more obvious that PyModule_AddModuleRef() is a replacement for PyModule_AddModule(), and the fact that it has a suffix means that it's the newer API, and there should be there for a reason. If it would be called differently, like PyModuleAppendModule() or PyModuleModuleAdd(), I would be more confused on what is the old API and what is the new one, which one should I pick?

The Ref suffix is a hint that: if you care about avoiding borrowed references, you should check for the list of all new Ref variants and use them in your code.

Obviously, if we change the suffix instead, the hint would be even stronger! If a function with new suffix exists, obviously it's a new API and it should be used. But I dislike the idea of changing the suffix since it smells too much like "designing a branch new API" and it's whole new can of worms: see issue capi-workgroup/api-revolution#9. Just one example, in my PR https://github.com/python/cpython/pull/106005 I was asked to change PyObject* parameter type to PyDictObject*. It would make sense in a brand new API. It belongs less an "an incremental fix" of the existing C API.

Moreover, in a "brand new API", soon we will have for sure to deprecate API and add new ones. Again, the question of adding a suffix / using a different function name arises. IMO it would feel weird to have to change the suffix each time that we add a variant of a function.

iritkatriel commented 1 year ago

@iritkatriel added _PyErr_ChainExceptions1(exc) to Python 3.12: function replacing _PyErr_ChainExceptions(exc_type, exc_value, exc_tb). Is the name related to the number of parameters: 3 => 1? Should it be renamed to `_PyErr_ChainExceptions2() before 3.12 final is released?

This is an internal function, so I didn't put too much thought into it. I think we should create a non-internal version at some point.

vstinner commented 1 year ago

At the end, sorry, but I failed to get the preferences of all people involved in this discussion. Do you prefer to switch to the number suffix for new APIs? Do you prefer to keep the status quo: add a name suffix (Ex, Flags, Ref, Object, etc.: the chosen suffix depends on the change)?

Currently, the vast majority of new functions use a "name" suffix: only 3 functions use a number suffix (2 suffix for the 2nd version): https://github.com/capi-workgroup/problems/issues/52#issuecomment-1607209512 The status quo is on the name suffix approach.

serhiy-storchaka commented 1 year ago

Plese note that adding a suffix means that function has almost the same signature (with possible changing the type of few parameters or returning type or adding new parameters). I think that using names which are only different by a suffix but have completely different signatures would be confusing. Especially if other functions with the same suffix have only only difference (or even only semantic difference) from the original functions. See python/cpython#106005. For functions with radically different signature we should use different naming scheme.

vstinner commented 1 year ago

I created PR #108870 to rename the private _PyThreadState_UncheckedGet() to a new PyThreadState_GetUnsafe() function. Before, the name had Unchecked but not as a suffix. I propose to add Unsafe suffix instead.

Python 3.13 has two functions with an Unchecked suffix:

And one function with Unsafe suffix:

The following functions are also documented as being Unsafe in a comment:

Note: I chose the Fast marker in these PyUnicode function names, and it was a bad idea :-) Nothing proves that it's faster than the public function. It's like METH_FASTCALL, how long before another faster method will happen? :-)

serhiy-storchaka commented 1 year ago

Note: I chose the Fast marker in these PyUnicode function names, and it was a bad idea :-) Nothing provides that it's faster than the public function. It's like METH_FASTCALL, how long before another faster method will happen? :-)

At least it is better than METH_NEWCALL.

vstinner commented 1 year ago

@serhiy-storchaka @erlend-aasland @encukou @markshannon: Do you have a preference between Unsafe and Unchecked suffix? Question related to https://github.com/python/cpython/pull/108870 addition.

gvanrossum commented 1 year ago

The words "unchecked" and "unsafe" have different meanings and connotations. In the issue you link to, which is about getting the current thread state or returning NULL if there is no current threadstate (instead of calling Py_FatalError), I think there's nothing unsafe about the function -- it is just a different API. So I don't think it should be renamed to "Unsafe".

vstinner commented 1 year ago

PyThreadState_Get() and PyInterpreterState_Get() are safe to use, since they are garanteed to be non-NULL. That's what I mean by "safe" vs "unsafe". Well, not sure if calling Py_FatalError() makes it really "safe" since this function aborts the process 😁

Do you have a better suggestion for the name?

gvanrossum commented 1 year ago

Stick with Unchecked. Just remove the underscore and be done.

davidhewitt commented 1 year ago

In Rust an unsafe function usually means the caller needs to uphold invariants which are documented rather than checked by the compiler. (Failing to do so typically leads to memory corruption.) If I understand correctly there aren't specific invariants for a PyThreadState_Get() alternative which can return NULL, so GetUnsafe also doesn't seem right to me (as someone normally reading Rust code).

PyThreadState_GetUnchecked seems to describe the difference between the two functions, so Unchecked would get my vote.

Well, not sure if calling Py_FatalError() makes it really "safe" since this function aborts the process 😁

With Rust's interpretation of unsafe, calling Py_FatalError() could definitely be a way to make an API safe by aborting when invariants are not correctly upheld. Aborting is safe :)

EDIT: Guido beat me to the reply while I was drafting this comment 😂

vstinner commented 1 year ago

I would prefer the PyThreadStateGetUnchecked() name proposed by @davidhewitt, instead of the PyThreadStateUncheckedGet() name proposed by @gvanrossum.

I like adding suffix: when functions are sorted in their documentation, it's easier to spot the variants. Otherwise, they may be sorted in a "random" order in the doc (unless the alphabetical order is not respected on purpose). Also, IMO it's easier to discover function variants when a suffix is *added** to the end of the existing name.

I think that I now get the different between "unsafe" and "unchecked". For this function, I agree that "unchecked" is a good fit.

encukou commented 1 year ago

Aborting is safe

As you mentioned, Rust's unsafe has a very specific meaning -- invariants normally checked by the Rust compiler aren't checked. And so, anything that compiler doesn't normally check for you is considered “safe”: aborting the process, memory leaks, deadlocks, infinite loops, and so on. IMO, this definition of safe/unsafe doesn't make any sense outside Rust. It's fine if Python uses it for something else.


As for PyThreadState_*Get*: if we want Unsafe/Unchecked to mean “danger, read the docs!”, IMO it's fine to use it here. If we want a term specifically for “returns NULL without setting an exception”, I don't think either Unsafe or Unchecked are it.

vstinner commented 1 year ago

FYI I updated my PR https://github.com/python/cpython/pull/108870 to rename the function to PyThreadState_GetUnchecked(). So far, nobody proposed a better name. I'm fine with this name.

encukou commented 11 months ago

Proposed guideline issues:

vstinner commented 3 months ago

Let's continue the discussion in the api-evolution project, in issues listed in the previous comment.