Closed sonotley closed 2 years ago
Thanks for the bug report. I can reproduce both issues: cProfile not working properly and the slow performance in your poker hand evaluator.
cProfile is fixed in the "nogil-3.9" branch, but that branch isn't ready yet. I should have a release based on the branch in a week or so.
The poor performance looks like its due to poor dictionary performance when the key hashes aren't well distributed (such as with integer keys). I'll have a fix for this soon.
Thanks for your reply, and thanks for the enormous amount of work you've put into this.
I got the feeling there was a performance issue with dictionaries since my code leans extremely heavily on the highly optimised dictionary structure for speed, the support for integer keys is also pretty essential for the prime number shenanigans I adapted from leading poker algorithms.
I'll give the new release a try when it's ready.
On Wed, 29 Dec 2021, 19:06 Sam Gross, @.***> wrote:
Thanks for the bug report. I can reproduce both issues: cProfile not working properly and the slow performance in your poker hand evaluator.
cProfile is fixed in the "nogil-3.9" branch, but that branch isn't ready yet. I should have a release based on the branch in a week or so.
The poor performance looks like its due to poor dictionary performance when the key hashes aren't well distributed (such as with integer keys). I'll have a fix for this soon.
— Reply to this email directly, view it on GitHub https://github.com/colesbury/nogil/issues/29#issuecomment-1002738348, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI2KCEP6SZDAGWGPQEDGL3UTNL2ZANCNFSM5K56A5UA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
cProfile and the poor performance in build_json.py is fixed, but you're not going to get good scaling with threads in your main_prime.py. There is a critical bottleneck on the reference counts in the data structures shared between threads, in particular "ranked_hands_dict".
Thanks for the update. Feel free to close this issue. I'll change the code to remove shared data structures as much as possible to see if I can get some mileage out of nogil.
On Thu, 6 Jan 2022, 17:17 Sam Gross, @.***> wrote:
cProfile and the poor performance in build_json.py is fixed, but you're not going to get good scaling with threads in your main_prime.py. There is a critical bottleneck on the reference counts in the data structures shared between threads, in particular "ranked_hands_dict".
— Reply to this email directly, view it on GitHub https://github.com/colesbury/nogil/issues/29#issuecomment-1006762905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKI2KCFWH2F6U4EIW72XQ6TUUXFDPANCNFSM5K56A5UA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
I've retested and can confirm the latest release is much faster. Still not matching standard CPython, but much improved. I'm also seeing some gains from threading. I've been looking at how to remove the reference counting bottleneck. The obvious change is to pass a copy of all the data structures to each thread. This definitely helps, but adds overhead to starting the thread. I read up on your changes to reference counting and had a couple of questions (based on my code, but I'm more interested in the general principles):
ranked_hands_dict
for each thread (i.e. args=ranked_hands_dict.copy()
). Does that fact that the copy was created in the main thread mean the spawned thread will be forced to use slower atomic reference counting? In which case it could feasibly be faster for each thread to create its own copy internally.ranked_hands_dict
is effectively an immutable singleton is there any way to have it treated as immortal and disable reference counting altogether?Sorry if these are stupid questions, I've never gone anywhere near the CPython code so this is all new to me.
I posted a thread on Reddit summarising some testing I performed where I found nogil to be very slow. I tried to use cProfile to understand the performance but the output from cProfile doesn't seem to make any sense - it reports a runtime far less than the actual observed runtime and the profile itself doesn't parse properly (I'm using Snakeviz). I've observed similar issues with the Docker image and with my own build of nogil.
I'm aware this is not a well-formed issue report, but just looking to gain some understanding. Should I expect cProfile to work on this fork?