Open jimkring opened 1 year ago
PS: @colesbury - thanks for all your work on nogil and I'm very excited to see PEP 703!
Hi @jimkring, I'll look into build a PyQT6 wheel for macOS.
I noticed that I get similar issues when I try to install numpy:
$ python -m pip install NumPy
Collecting NumPy
Using cached numpy-1.24.1.tar.gz (10.9 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: NumPy
Building wheel for NumPy (pyproject.toml) ... -
Update: I think that numpy was just taking a while. Also, I was able to install/build it from source using this command:
pip install git+https://github.com/colesbury/numpy.git@v1.24.0-nogil
Update: Using the --verbose
flag showed that it was hanging up while prompting for agreement to the license terms.
I followed the notes here and used the following command to automatically accept the license during the pip install:
python -m pip install --config-settings --confirm-license= --verbose pyqt6
(note the "=" after "--confirm-license=" is required)]
Hi @jimkring, I think something strange is going on. You should be able to just run "pip install numpy" and get the pre-built binary wheel for NumPy 1.24.0 on macOS. You should not need to install it from source.
Can you provide details about your installation:
pip list
?Hi @colesbury.
I'm on an Intel mac.
I installed nogil via pyenv install nogil-3.9.10
.
I then created a virtual environment via /Users/jim/.pyenv/versions/nogil-3.9.10/bin/python -m venv .venv-no-gil
.
Then, after activating the venv, I think I probably updated pip (out of habit):
python -m pip install --upgrade pip
Here's the output of pip list
(.venv-no-gil) MacBook-Pro:pyside-setup jim$ pip list
Package Version
----------------- ----------------------
async-generator 1.10
attrs 22.2.0
exceptiongroup 1.1.0
idna 3.4
iniconfig 2.0.0
numpy 0.3.0+30167.g63723dfcb
outcome 1.2.0
packaging 23.0
pip 22.3.1
pluggy 1.0.0
pydantic 1.10.4
PyQt6 6.4.0
PyQt6-sip 13.4.0
pytest 7.2.1
pytest-asyncio 0.20.3
pytest-trio 0.8.0
setuptools 65.5.1
sniffio 1.3.0
sortedcontainers 2.4.0
tomli 2.0.1
trio 0.22.0
typing_extensions 4.4.0
wheel 0.38.4
I should also mention:
I've been able to successfull build+install pyqt6. That's good news.
However, I then realized that PyQt6 does not include the Signal
and Slot
modules that I need (from PySide6). So, I tried building PySide6 from source and I've run into new error, when it tries to build libshiboken.
Here's how I did that (and let me know if you'd like me to open a different issue for the pyside/libshiboken build issue).
Pull a copy of the pyside sources
git clone https://code.qt.io/cgit/pyside/pyside-setup.git
Switch to the version/branch we want to build
git checkout 6.4.2
Start the build
python setup.py install --qtpaths=/usr/local/opt/qt/bin/qtpaths6 --parallel=8 --build-tests
[57/503] Building CXX object libshiboken/CMakeFiles/libshiboken.dir/helper.cpp.o
FAILED: libshiboken/CMakeFiles/libshiboken.dir/helper.cpp.o
/Library/Developer/CommandLineTools/usr/bin/c++ -DBUILD_LIBSHIBOKEN -DHAVE_NUMPY -DNDEBUG -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -DPy_LIMITED_API=0x03060000 -I/Users/jimkring/JKI/GitHub/FlowRailsIDE/pyside-setup/sources/shiboken6/libshiboken -I/Users/jimkring/JKI/GitHub/FlowRailsIDE/pyside-setup/build/.venv-no-gil/build/shiboken6/libshiboken -I/Users/jimkring/JKI/GitHub/FlowRailsIDE/.venv-no-gil/lib/python3.9/site-packages/numpy/core/include -I/Users/jimkring/.pyenv/versions/nogil-3.9.10/include/python3.9 -Wall -Wextra -Wno-strict-aliasing -fvisibility=hidden -D QT_NO_CAST_FROM_ASCII -D QT_NO_CAST_TO_ASCII -O3 -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -mmacosx-version-min=12.6 -fPIC -fPIC -std=gnu++17 -MD -MT libshiboken/CMakeFiles/libshiboken.dir/helper.cpp.o -MF libshiboken/CMakeFiles/libshiboken.dir/helper.cpp.o.d -o libshiboken/CMakeFiles/libshiboken.dir/helper.cpp.o -c /Users/jimkring/JKI/GitHub/FlowRailsIDE/pyside-setup/sources/shiboken6/libshiboken/helper.cpp
/Users/jimkring/JKI/GitHub/FlowRailsIDE/pyside-setup/sources/shiboken6/libshiboken/helper.cpp:211:30: error: no member named 'ob_refcnt' in '_object'
str << ", refs=" << obj->ob_refcnt << ", ";
~~~ ^
1 error generated.
We can see this error error: no member named 'ob_refcnt' in '_object'
-- it's worth noting that I see a similar error with other tools that try to work with the cpython sources like this error when I try to build an exe using nuitka within python nogil.
Thanks for looking into this. I'm excited to start testing out nogil (and encouraging others to do so).
I've been able to successfull build+install pyqt6. That's good news.
Great! I got it to build locally too after some challenges with homebrew. I'll see if it's not too much work to build binary wheels once we figure out the other issues.
So, I tried building PySide6 from source and I've run into new error, when it tries to build libshiboken.
This looks like it needs to use Py_REFCNT(obj)
instead of obj->ob_refcnt
. The reference count field changed (it's now two fields in "nogil Python), so you can't access them directly. The Python docs now say to use Py_REFCNT
instead of ->ob_refcnt
so maybe we can get this change upstreamed.
It also sounds like there is a similar issue in nuitka. Is nuitka a dependency of pyside6?
let me know if you'd like me to open a different issue for the pyside/libshiboken build issue
Let's wait until we figure out any issues on the nogil side.
Hi @colesbury some answers and updates...
It also sounds like there is a similar issue in nuitka. Is nuitka a dependency of pyside6?
No, nuitka is not related to pyside6 -- nuitka is a python compiler for building stand-alone executables.
Regarding the pyside/libshiboken build issue, I did a search in the code for use of obj->ob_refcnt
and tried replacing with Py_REFCNT(obj)
. This is a bit tricky since there's a lot of dynamic C++ code generation happening, and many fixes end up looking like the following:
before:
for (int index : std::as_const(invalidateArgs)) {
s << "bool invalidateArg" << index << " = PyTuple_GET_ITEM(" << PYTHON_ARGS
<< ", " << index - 1 << ")->ob_refcnt == 1;\n";
}
after:
for (int index : std::as_const(invalidateArgs)) {
s << "bool invalidateArg" << index << " = Py_REFCNT(PyTuple_GET_ITEM(" << PYTHON_ARGS
<< ", " << index - 1 << ")) == 1;\n";
}
Also, I looked at what it would take to upstream such fixes to PySide6 and the process described on the Qt Contribution Guidelines is a little daunting to me.
I have a pyside6 build in progress and it takes a long time... we'll see how it goes.
I've gotten farther -- I can now build/install pyside6. This runs to completion:
python setup.py install --qtpaths=/usr/local/opt/qt/bin/qtpaths6 --parallel=8 --build-type=pyside6 --reuse-build --skip-modules=WebEngineCore,WebEngineWidgets --no-qt-tools
However, I get a segfault, even with a simple hello world example :(
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
@jimkring I will have a deeper look into nogil on PySide.
Thanks @ctismer!
@ctismer -- I just noticed you're on the Qt team. I'm sure you can figure this all out much faster than I could ever, and... I figured I'd share with you the changes I made to the pyside6 code to get it to build. Here is a PR with the changes in a fork I made of pyside-setup -> https://github.com/jimkring/pyside-nogil/pull/1 (I'm working in the nogil branch)
Yes folks, I've been hacking away at PySide for 7 years now and even managed to get a PyPy build working. So I thought it would be a good thing to jump on the NoGil wagon as the next big thing for PySide. PyPy really pushed me to my limit and beyond in that regard, although the response to it was pretty unsatisfactory. I think the effort-to-benefit ratio should be much better with NoGil.
Thanks for your patch @jimkring , I'll get right on it :-)
However, I then realized that PyQt6 does not include the
Signal
andSlot
modules that I need (from PySide6)
FYI: The equivalent to Signal and Slot in PyQt are pyqtSignal and pyqtSlot.
Python 3.10.9 (main, Dec 15 2022, 18:18:30) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
But now warming up the VS-Code debugger, quite helpful to look at a debug build. BTW: Instead of NoGIL, I would call this the Gillespie branch...
@ctismer itβs nice to learn more of the history of pyside. Thanks for that info (and tips). I am certain NoGil is going to be a huge benefit to all, so thanks in advance for supporting that. Itβs a fun place to explore, for sure. Cheers.
@colesbury I see quite different opcodes - even the convention of 2 byte per opcode was not used. This will become a quite special hack :) Do you intend to clean this all up, or will the final Py 3.12 (?) nogil be written from scratch? Another, real question: Is the sys.flags.nogil the only way to know who I am, or is there maybe a compile time info?
@ctismer Does PyQt depend on the byte code format? The byte code changes will probably not be ported to 3.12.
In C code, you can use the Py_NOGIL
define to check for the nogil build.
@colesbury Ah, fine! I don't think PyQt depends on byte code. I use a special trick to support the old enum syntax of Qt, therefore the hack. But it's just a little introspection, and the feature can vanish, soon.
Thanks for your help!
Hi @colesbury and @ctismer,
Here's a nice update -> I've been able to get my application to run on NoGIL with PyQt6!
(Christian, your the tip about where to find pyqtSignal and pyqtSlot was very helpful, so thanks again. I'm still VERY interested in getting PyQt6 running in NoGIL, so I'm still very eager to help test PySide6 on NoGIL and don't want my my "good news" to slow us down :)
For what it's worth, Sam, my application seems to be much snappier running in NoGIL. It uses trio for concurrency -- I used the asyncio examples on doc.qt.io/qtforpython as a starting point, which has worked well (Christian, perhaps you helped write those? Thanks!).
I'm now going to look into how best to move from asyncio over to threads for concurrency+parallelism and see how that goes.
Your app is running? But there must be lots of bugs in the test suite, at least I get quite a lot in PySide. I guess when you really use parallelism, the problems will begin.
@colesbury We have some unit tests which check the refcounts. Some of them fail, they are off by one. The tests use no threads. Is that to be expected?
Also, about 1/5th of unit tests segfault at the end. How can that be without threading?
I still have to test with setting the environment flag, tomorrow.
@ctismer my unit tests probably aren't as extensive as yours and perhaps I have bugs in the form of race conditions that are yet to be observed :)
I'm a parallel programmer by trade, and I'm excited to help Python become more parallel.
We have some unit tests which check the refcounts. Some of them fail, they are off by one. The tests use no threads. Is that to be expected?
Yes. Some objects are "immortal", including small integers. sys.getrefcount
lies and says the refcount is 999 (chosen because it stand out more than 2^32 or something). Depending on the arithmetic performed in the test, you can end up with off-by-one errors. (I've seen similar issues in some NumPy refcounting tests.)
For what it's worth, "immortal objects" is probably going to land in Python 3.12 completely independently of the nogil changes (and with a different implementation.)
Also, about 1/5th of unit tests segfault at the end. How can that be without threading?
I don't know without more information. If you can get a stack trace from gdb, that would help.
@colesbury Hi Sam, just tried it: Setting PYTHONGIL=1 does not change it, I still have weird behavior on shutdown. Maybe it is something that we are doing wrong since a long time - no idea. We don't have a problem on CPython and not on PyPy, but who knows :) I will send you dump output later (after our weekly meeting). Cheers -- Chris
python3.9-2023-02-09-113333.txt This was sources//pyside6/tests/signals/lambda_test.py lambda_test.py.txt
Hi @ctismer, I think the problem is that signalmanager.cpp is calling into Python without trying to acquire the GIL:
Although, the whole point of the project is to remove the GIL, it's still important to call those same functions to attach/detach the thread from the VM. The functions are still important, they just don't prevent multiple threads from running concurrently. (Here's the section of the PEP that explains it a bit more.)
I don't think calling the PyDict
functions without first acquiring the GIL is intended to be supported upstream either.
@colesbury Whow, thank you! How could you find that with that dump? For some reason, it was quite unreadable for me. Repeated this now and got a better readable one, which in fact tells me what you said. Did you try this, yourself? python3.9-2023-02-09-143507.ips.txt Checking signal manager protection now. Makes sense.
Yeah, it works for this example. Running all tests now. I need to find a finer way to protect this signal mess, because we get into the "5ms-carousel" by using the GIL often.
Yes, only 7 out of 616 tests fail - that is great!
@ctismer Great! A lot of the same issues crop up repeatedly both during development of the "nogil" fork and in extensions. This one, where _Py_current_tstate
is NULL
, I've seen mostly during development of the fork itself. (The _Py_current_tstate
and PyErr_Occurred
in the dump were helpful clues.)
I'll start maintaining a list of common issues and how they appear. Maybe that will help with future extensions.
EDIT: Started documenting common issues on the wiki: https://github.com/colesbury/nogil/wiki/Common-issues-with-extensions
BTW.: Do you know the GIL-carousal effect? The reason to use no GIL with signals? We struggled with that very much in 2020. Releasing the GIL often makes your app slower. But I think/hope that problem gets away by the use of NoGIL. https://bugreports.qt.io/browse/PYSIDE-803
@ctismer I hadn't heard of the GIL carousel effect, but the linked bug report explains it well. I've encountered performance issues due to releasing the GIL, mostly due to the overhead of PyEval_Save/RestoreThread
being large compared to the actual work performed.
Both those issues should be greatly reduced with the nogil fork: the overhead of PyEval_SaveThread
is a lot smaller than in upstream CPython and it doesn't have the 5ms handoff (unless you run with the PYTHONGIL=1
compatibility mode).
@colesbury Your hint was great. The remaining bugs were introduced by the change to the PyCFunctionObject, which grew a new field with an unfortunate offset. After fixing that, there are only 2 bugs left:
The python_thread test is flaky. It seems that something needs to be locked, now. The cyclic test is unclear, yet. It fiddles with garbage collection.
I might stop this now, since the basic functionality is there. Making things really thread-safe is maybe too time-consuming with a prototype.
@ctismer Do you have a fork with changes? Can you point me at it? I'd like to take a look at the threading issues.
@colesbury Yes, I will create a branch on github, tomorrow.
@colesbury Hi Sam,
you find a clone of our dev repo at https://github.com/ctismer/pyside6-dev. .
There is a branch dev-nogil-patches
.
This repo needs the Qt 6.4.2 branch. I am building with the following setting:
export PATH=/Users/tismer/Qt/6.4.2/macos/bin:$PATH
/Users/tismer/.pyenv/versions/nogil-3.9.10-1-debug/bin/python3 setup.py install --debug --limited-api=no \
--reuse-build --build-tests --module-subset Core,Gui,Widgets --skip-docs && \
python3 testrunner.py test > log3.log 2>&1
This should work with all but the two mentioned tests:
Oh, and I saw the nogil-3.12
branch. Should I already switch to it?
@ctismer - thanks! I'm able to reproduce the issues. The nogil-3.12
branch isn't ready yet. Here's what I've found so far:
sources/shiboken6/tests/samplebinding/python_thread_test.py
The bucket list was previously protected from concurrent access by the GIL. Without the GIL, it needs protection by a mutex (e.g., https://github.com/colesbury/pyside6-dev/commit/d71fb16537c57273199df042e5d4565eb635bbc1)
sources/shiboken6/tests/samplebinding/cyclic_test.py
This isn't related to the GIL, at least not directly (there aren't any threads in this test). It has to do with the order of destruction of cyclic trash in the garbage collector. The immediate cause is that the child C++ ObjectType
gets deleted twice. First it's deleted directly, then the parent deletes it when destroying m_children
. I'm still investigating this. I'm not sure why this happens -- there seems to be a bunch of code in basewrapper.cpp
to prevent this situation.
For context, the order of destruction of cyclic trash is sometimes different than in upstream. In upstream CPython, it's mostly deterministic (usually in older objects are destroyed first). In nogil Python, objects are found scanning the GC heap, and a lot of things can affect the order. (I think this is why the test only crashes sometimes.)
Mostly for my own future reference, here's the steps I took to build pyside6 (on macOS 13.1, ARM64):
@ctismer - here's a possible patch to address the cyclic_test.py issue:
https://github.com/colesbury/pyside6-dev/commit/090912ae2124ff5475d31b011e18dc67860e64a2
For context, you can trigger a similar crash in upstream CPython if you change the order in which CyclicChildObject
and CyclicObject
are constructed so that CyclicChildObject
is finalized first. Here's an example of what I mean: https://github.com/colesbury/pyside6-dev/commit/8a8e88dfe5df9d7241e101296e4464b46b4e47f6.
@colesbury Great, I will continue tomorrw. Are you aiming at inclusion into 3.12, or is it more realistic to expect multiple interpreters and GILs in 3.12?
@ctismer - I forget exactly when the 3.12 feature freeze is, but at this point I don't expect any major GIL-related changes to make it in to 3.12.
@ctismer - I forget exactly when the 3.12 feature freeze is, but at this point I don't expect any major GIL-related changes to make it in to 3.12.
Freeze is May 8th
@colesbury Hi Sam, I have created a pull request and merged the changes back into my Gerrit branch. I'm currently testing it with normal Python, too.
The fix for the cyclic
test is great, thank you!
But: why is the fix necessary? Because something in nogil
is deleted in different order.
It would be nicer to avoid this. Not even PyPy has problems with this, so it respects the order.
Is it possible to avoid this? Or is it principled that there is a non-deterministic order of destructors?
@colesbury Ok, I checked the patches also with normal Python, all great.
Because of the different handling of pull-requests in gerrit, I had to recreate the
branch dev-nogil-patches
. This is now the version that I will try to put into the main
repo, tomorrow.
In a sense, we can claim now
modulo the many missing NoGIL tests π
@ctismer It'll be great to get these changes merged into the main repo.
In general, I'd like to preserve the behavior of CPython, but in this case, I don't how to, at least not without substantial performance penalties. It might be possible to do something narrow that would avoid the issue in cyclic
test, but I'm not sure if that's worthwhile. What I mean is that there is the order in which tp_finalize
, tp_clear
and tp_dealloc
are each called and objects may be finalized/cleared/deallocated in different orders. The cyclic
test is only sensitive to the tp_dealloc
order.
PyPy has a pretty different GC implementation than CPython, and at least in some simple tests, can finalize objects in a different order than CPython. I'm not sure why the cyclic
test doesn't have issues with PyPy. Maybe the tp_dealloc
order is the same in this case as in CPython. I tried investigating this, but wasn't able to build pyside6 for PyPy successfully. (My --limited-api=yes
build had compile errors due to duplicate definitions and the --limited-api=no
build segfaulted during builds.)
@colesbury Ok, nice that I may merge this into the main repo (hope the'll take it).
Ok, I take it that it was a bug in our code that just worked, but needed to be made more stable against changes in the calling sequence.
On PyPy
: Well, I forgot to mention that PyPy
can only be built with --limited-api=no
.
With the current status of PEP 384ff
, you cannot build PyPy
because the incompatibilities go much deeper. There is a better API in the works that would work for PyPy, possibly. (Forgot which one π ) HPy https://hpyproject.org/ but I don't know the status.
That it segfaulted during build is probably an overlap since you built two versions of Python 3.9 (maybe). You should try a clean build, then things should work. You can run the tests also with a build, only. The overlap comes from a site-packages entry that is not unique.
Hi @colesbury and @ctismer. I'm still getting a Segmentation fault: 11
when running PySide6 (from your branch here).
it appears to be happening in the constructor of an object which is a subclass of QGraphicsRectItem
-- the crash happens right before the __init__
constructor returns when I call this function self.indicator_text = QGraphicsTextItem(parent=self)
-- self
in this case is an instance of QGraphicsRectItem
.
Note: I'm testing with nogil 3.9.10-1 and qt 6.4.2.
Also: I'm able to run with PyQt6 (installed with python -m pip install pyqt6
) just fine.
I was able to pinpoint the crash location using a combination of python -X faulthandler myapp.py
to output the trace and then putting print statements to isolate the crash. I'm not extremely well versed in using debuggers in Python (pycharm complained about opcodes when I tried to run with a debugger in nogil) so I'm not sure how best to help.
Hi @jimkring can you please give me some code? Simply creating an QGraphicsTextItem
object works for me:
$ /Users/tismer/.pyenv/versions/nogil-3.9.10-1-debug/bin/python3
Python 3.9.10 (main, Feb 6 2023, 15:50:18)
[nogil, Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from PySide6.QtWidgets import *
>>> self = QGraphicsRectItem()
>>> crash = QGraphicsTextItem(parent=self)
>>>
Quite apart from that, please do not get your hopes up too high: I would like to point out that this NoGIL PySide simply works, without a net and double bottom. There is no other protection against threading than the one Sam has built into Python. PySide needs to go a long way through thorough testing and replace the GIL protection of PySide objects accordingly. This has not yet been addressed. (I would like to work on this though... π ).
However, I will try to craft a NoGIL version of the PySide-Mandelbrot example shortly.
Hi @jimkring @colesbury As promised, I have adapted a PySide Mandelbrot program for use with NoGil. The implementation is a converted Qt example and not in the best style for Python. I tried to keep the changes to a minimum for comparable results.
My version takes the display QImage
and breaks the computation into 1-pixel stripes. Then a ThreadPoolExecutor
operates on that. The result is not as great as I expected, probably because the thread pool causes quite a lot of overhead.
I tried different thread pool sizes. Funnily, maxworkers=4
works best (fastest). The first generations are accelerated by a factor of >3. I was not able to do further analysis because of threading and opcode issues.
So, this result is partly nice, partially disappointing, because of the worse performance with increased core number. If there is interest, I can upload the scripts, of course. But in order to see maximum performance, I will try another implementation where I split the display matrix into not much more tiles tiles than cores.
Hi again, I have modified the example so that I also can choose a stripe size, in order to make the computed chunks larger. This works fine, but seems to have little effect. The unclear fact stays still true:
@colesbury Can you imagine what that behavior means? Do I have a glaring bug, or is the function concurrent.futures.as_completed
a problem?
I am uploading the source files now. mandelbrot.py.txt mandelbrot_nogil.py.txt
The machine used was a Mac Studio Max
with 10 cores
and 64 GB
main memory, btw.
@ctismer I'm seeing speedups with 8 threads for larger number of iterations:
EDIT: Oops I swapped the results from 4 and 8 threads.
Nevermind - I swapped the results from 4 and 8 threads. Really strange that higher number of iterations scale worse with more threads...
Hi. I'm trying to install PyQt6 on nogil. I have no idea if it should work or not.
It gets stuck at
Preparing metadata (pyproject.toml) ...
I installed qt6 with homebrew and it appears to accessible in the PATH -- I can run
qmake
just fine.Thanks for any ideas. Looking forward to exploring nogil!
Update