TokTok / c-toxcore

The future of online communications.
https://tox.chat
GNU General Public License v3.0
2.21k stars 280 forks source link

c-toxcore releases should include generated Python bindings #2702

Open emdee-is opened 4 months ago

emdee-is commented 4 months ago

c-toxcore releases should include generated Python bindings.

I wrote: https://github.com/TokTok/c-toxcore/pull/2694#issuecomment-1950368138

This should be labeled with api-break as changes it the fields in toxOptions.

iphy replied: https://github.com/TokTok/c-toxcore/pull/2694#issuecomment-1950502175

The memory layout of this struct (size, alignment, and field order) is not part of the ABI.

It takes more than a comment in a header file to determine a library's ABI: it's determined by the uses made of the library, for example complete wrappers into other languages is a good example. If they need access to the structure, then it should be considered in the ABI. Even if you want to make the structure private, you can't ignore the fact that the project has been adding getters and setters to the ABI with no discussion or ABIbreak tagging or version bump, and has turned long supported functionality into an optional feature without warning or version bump. That's doesn't conform to semantic versioning.

If the project followed semantic versioning and pushed minor releases for all these changes, the project would have more releases as they get merged in, which is a Good Thing. I doubt the project eill do the planning required for semantic versioning as things are "or not even not being discussed at all but being in developers' heads and then dropping as a surprise fully implemented PR, [which] is not very good."

The only way of doing something like a good wrapper into Python is if c-toxcore keeps up-to-date one of the existing ctypes wrappers, or uses the standard wrapper generator ctypesgen. I think the project should include a generated wrapper as a part of c-toxcore releases so that the code is always matched to the Ccode. (ctypesgen has no dependencies relying only on Python, and wraps c-toxcore in about a minute.) Many good projects ship the Python bindings with the C code (e.g. zmq).

All development in Python has been frozen for more than 2 years, and yet there's been no need for any tools other than ctypesgen, which is widely used and continually developed by some of the core Python developers, including the BDfL. The use of home-brewed tools written in a language not available, or explicitly banned, in corporate enviornments, that generates debugger-unfriendly code, is counter-productive.

iphydf commented 4 months ago

Should it also include go, JavaScript, kotlin, Haskell, rust, and zig bindings?

Green-Sky commented 4 months ago

@emdee-is said struct was deprecated and declared non-abi 8 years ago.

nurupo commented 4 months ago

It takes more than a comment in a header file to determine a library's ABI: it's determined by the uses made of the library

No, that's not how API and ABI are determined. They are just the interfaces provided by the library, one concerns the interface on the source code level, another on the binary level.

you can't ignore the fact that the project has been adding getters and setters to the ABI with no discussion or ABIbreak tagging or version bump

That's because adding getters and setters doesn't break API nor ABI.

You are complaining about ABI breakage, yet you don't seem to know what it is exactly yourself.

That's doesn't conform to semantic versioning.

*The* semantic versioning? The one that says:

4. Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

?

You are complaining about semantic versioning, yet you don't seem to know what it is exactly.

It takes more than a comment in a header file to determine a library's ABI

The comment has been there since forever, I went as far back as v0.0.1 and it's still there.

https://github.com/TokTok/c-toxcore/blob/51139a080c982a6460aa25975b3a1e0866f518ed/toxcore/tox.h#L431-L433

Between Semantic Versioning claiming that anything may change at any time and this ABI contract being documented since forever, I don't really see an issue here.

It takes more than a comment in a header file to determine a library's ABI: it's determined by the uses made of the library, for example complete wrappers into other languages is a good example. If they need access to the structure, then it should be considered in the ABI.

The only way of doing something like a good wrapper into Python is if c-toxcore keeps up-to-date one of the existing ctypes wrappers, or uses the standard wrapper generator ctypesgen. I think the project should include a generated wrapper as a part of c-toxcore releases so that the code is always matched to the Ccode.

So... how exactly would generating Python bindings off the C code catch any API or ABI incompatibilities?

ctypesgen doesn't perform any such checks, in fact it doesn't even take the previous version of the API as an input, so it can't compare anything. It just takes the current API and spits out ctypes bindings of that API.

If you want to check for API/ABI breakage, there are specific tools that do just that, there is no need to generate Python bindings for this.

Now, as far as including Python bindings into the c-toxcore repository in general -- we have a separate repository for python bindings https://github.com/TokTok/py-toxcore-c

emdee-is commented 4 months ago

@nurupo I agree with you that the definition of semantic versioning is flexible, but I'm coming at it from the point of view of wrapping the core, and I have no versioning that lets me know what the API of a version is for the Options/setters, and no idea when the default behaviours change.

So... how exactly would generating Python bindings off the C code catch any API or ABI incompatibilities?

My idea here is that the Python bindings would ship with the Ccode and always be current and up-to-date. Wrapping and testing the wrapping would be a part of the release process.

Now, as far as including Python bindings into the c-toxcore repository in general -- we have a separate repository for python bindings https://github.com/TokTok/py-toxcore-c

A repository that was a rewrite from scratch started 2 years ago and has been of no use during all that time. That generates code that you can't use for python-enabled gdb, so I think uses the wrong approach. And to modify it one would have to install Haskell, a language practically nobody uses or has installed. ctypesgen (or one of the handmade ctypes) would be a better approach, using a tool centrally supported by the python community, and would get the Project supporting Python right now.

emdee-is commented 4 months ago

Should it also include go, JavaScript, kotlin, Haskell, rust, and zig bindings?

That's a good question and the answer may be more than a one-size-fits-all lowest-common-denominator response.

Clearly the most important languages to wrap into are Python and Java. C-toxcore has an independent java wrapping which I assume everyone is happy with and that there are no plans to move these bindings to a one-size-fits-all approach.

The Python bindings were working in C and the bindings stopped working in anticipation of a home-brewed scheme that had almost no coverage, which ended all development in Python on the project. That's bad because there were excellent and easily maintainable Ctypes bindings that could have been adopted by the project instead, and Python had one of the best, and only NGC, desktop clients. That situation persists to this day.

The homebrew bindings generate Cython, which is not a good technology to use for this compared to ctypes, and I can find no trace of a discussion where the reasons for using Cython were ever justified. I can see no reason not to end that approach now and adopt the mature ctypes wrapping that exist today. There's more to doing a wrapping than just the bindings, and Python is a moving target because of typing so ctypesgen would help as the language evolves. You can do a homebrew solution that gives a poor result noone wants to use: for example if it does not easily support debugging across the Python/C interface - think SWIG. So I would argue against a one-size-fits-all approach at least for Python (and I assume Java).

After Java and Python there may be merit in a one-size-fits-all but that there maybe better approaches that writing you own tool in a language almost nobody knows or has or uses. Look at the way ctypesgen covers generating Lua bindings as a example.

emdee-is commented 4 months ago

@emdee-is said struct was deprecated and declared non-abi 8 years ago.

I know but a comment in the header doth not make an ABI make: every time new fields are added, new setters and getters are added, which is a change in the ABI and must be a minor version bump. The project does no semantic versioning to allow the users of the ABI to know what's in there; just big releases years apart.

In addition new fields/setters/getters are added with no warning and no semantic versioning on changes that strongly impact existing features, sometimes negatively.

I doubt this will change which is why I argue for including the Python wrappers in the release and automatically keeping them up to date with the code.

nurupo commented 4 months ago

Per semantic versioning, as I quoted above: with the major version being 0 -- anything goes. So we do adhere to the semantic versioning. We actually do better than that and have minor version stand for breaking changes (major-like) and patch version stand for feature+patch changes (minor-like), like iphy pointed out in https://github.com/TokTok/c-toxcore/pull/2694#issuecomment-1960530303.

As for releases being done years apart -- that's factually false, check the release history. Some years we had 3-5 releases, some years less, in 2023 we had none. We release when we are ready to release and when there is something to release. For example, there were barely any commits between January 2023 and August 2023, or January 2019 and December 2019, etc. What would we release if there is nothing to release? We are not as big as the Python project, there are only a few developers working on this in their free time, we are not being paid for this and we have our lives which often take priority over Tox. So you have to tamper your expectations and not compare us to such juggernauts as the Python project.

emdee-is commented 4 months ago

@iphydf corrected me as I had missed the comment in tox.h: https://github.com/TokTok/c-toxcore/pull/2694#issuecomment-1960530303 So much of what I wrote on semantics does not apply as it was a project decision not to observe the .patch versions until 1.0.0 - fair enough.

But back to the issue at hand, that I've had some futher thoughts on:

Should it also include go, JavaScript, kotlin, Haskell, rust, and zig bindings?

I note that the list doesn't include Java, which is probably the most important because of the Android clients - I'm assuming everything is fine there and there are no plans to include it in the "to be generated" list.

It seems the length of the "to be generated" has been used as a rationale to stop all development in Python for >2 years, even though good alternatives have existed all that time. And when the hs-genapi works for python the Cython layer need to be wrapped in a friendlier Python layer (already done for ctypes). And when the friendlier Python layer is done, there'll still need to be a comprehesive testsuite written with good coverage (already well underway for ctypes) https://github.com/TokTok/py-toxcore-c/issues/79. So at this rate the hs-apigen Python route, with tests, means years more of dead development.

And I suspect, having used SWIG, C, Cython and ctypes wrappings in Python, the Cython wrapping may not be well received. Look at the evolution of ctypesgen with Python to see what the issues are, and the lack of distro provided gdb-python support that ctypes has is a significant drawback of a Cython route. (I know nothing about Rust but suspect a Rust community-supported generator may be the way to go there too.)

So you have to tamper your expectations and not compare us to such juggernauts as the Python project.

I know, but that makes planning and prioritization all the more important: generating the bindings with a home-brewed program written in a language almost nobody uses, for which there is almost no depth of expertise the project, that is non-existent in corporate settings, in a completely undocumented side-project (not even a comment), while all other development is stopped - I can't find where rationale for that is made.

Let's get Python bindings working in the core as a priority now, and get the integration testsuite finished so it can be ported to start comprehensively testing the existing wrappings: node, js, kotlin, go, etc. There's lots of changes in the upcoming release that there is no testing of those existing bindings.

PS: FWIW ctypesgen can also generate a language neutral json file which has been used for generating Lua bindings. It might be a better starting point than a homebrew program as it will be maintained and kept up to date by the Python community.

Green-Sky commented 4 months ago

I note that the list doesn't include Java, which is probably the most important because of the Android clients - I'm assuming everything is fine there and there are no plans to include it in the "to be generated" list.

Kotlin

It seems the length of the "to be generated" has been used as a rationale to stop all development in Python for >2 years,

nah, there was just no interest from any developer.

there'll still need to be a comprehesive testsuite written with good coverage

iphydf has made good progress on that (:

working in the core

there will never be binding "in the core", but as separate repositories. Again, they all need to be generated, to keep the cost of maintenance as low as possible and keep releases in lockstup with c-toxcore (if need be).

emdee-is commented 4 months ago

nah, there was just no interest from any developer.

That's not so: 3 of the best developers were in Python. They wrote over 50000 lines of quality Python code. They did the best desktop client tox has ever had, which did NGC groups >6 years ago https://github.com/toxygen-project/toxygen/tree/next_gen . They had the best integration suite testing, which can be used for Tor testing, 6 years ago https://github.com/TokTok/py-toxcore-c/issues/79#issue-1384792614 and AFAIK no other binding has that level of testing. One of their C binding was a branch of py-toxcore-c: https://github.com/TokTok/py-toxcore-c/tree/v0.2.0 One of them igave me the feeling that the project management was a part of why he dropped further work.

You don't have to be a python developer to recognize that taking an existing working binding for a language with a test suite, and replacing it with a homebrew in another language that has no coverage for over 2 years is not the way to have or support Python developers. None of them would have requested that.

there'll still need to be a comprehesive testsuite written with good coverage

iphydf has made good progress on that (:

Not that I know of in Cython or any other of the bindings I looked at. Nothing like what was in Python 6 years ago (see above).

there will never be binding "in the core", but as separate repositories. Again, they all need to be generated, to keep the cost of maintenance as low as possible and keep releases in lockstup with c-toxcore (if need be).

The cost of maintenance of py-tox-c is as low as possible as it has been static with no coverage for over 2 years. The cost of hs-apigen greatly exceeds the cost of adopting an existing ctypes binding, and that's not addressing the other parts I mentioned above that still need doing. And right now, it looks like it is only targeted on Cython, where it's needed least, and presumably each new targeting will also take years at this rate. But I admit the cost of maintenance might be low because no one else programs in Haskell. But the cost of doing homebrew in Haskell has been no support for Python, no funtional testing of the bindings, and will mean divorcing the project from the steady improvements that go into cypesgen.

iphydf commented 4 months ago

https://github.com/TokTok/py-toxcore-c/blob/v0.2.0/tests/tests.py

Is this the integration test suite you're talking about? And the manual C API is the high quality one you're talking about? I'm a bit confused because you also seem to be talking about ctypes, which I don't remember being used in pytox. Can you point me at the things you're saying have been done 6 years ago and should have been done?

emdee-is commented 4 months ago

https://github.com/TokTok/py-toxcore-c/blob/v0.2.0/tests/tests.py

Is this the integration test suite you're talking about? And the manual C API is the high quality one you're talking about? I'm a bit confused because you also seem to be talking about ctypes, which I don't remember being used in pytox. Can you point me at the things you're saying have been done 6 years ago and should have been done?

Toxygen was always ctypes. The ctypes wrapping is in the next_gen branch of toxygen I linked to above https://raw.githubusercontent.com/toxygen-project/toxygen/next_gen/toxygen/wrapper/tox.py https://github.com/toxygen-project/toxygen/tree/next_gen/toxygen/wrapper is from ~6 years ago.

I broke the ctypes wrapper out and I've updated it with new bugs and features at: https://git.plastiras.org/emdee/toxygen_wrapper/src/branch/main/src/toxygen_wrapper/ (WIP)

The testsuite from the pytox /v0.2.0/tests/tests.py has been ported to https://git.plastiras.org/emdee/toxygen_wrapper/src/branch/main/src/toxygen_wrapper/tests/tests_wrapper.py

IMHO you should port that testsuite to each of your existing bindings as they are now before doing anything else. https://github.com/TokTok/py-toxcore-c/issues/79 Then run them with and without SOCKS and see what happens.

iphydf commented 4 months ago

Understood, thanks. Yes, writing tests is the plan right now, making sure we have 100% (or close to it) coverage. There's nothing we can do in toxcore right now before the release except writing tests.