GPUOpen-LibrariesAndSDKs / cpu-core-counts

A sample demonstrating how to correctly detect physical core and logical processor counts on AMD processors.
MIT License
55 stars 13 forks source link

Fix for incorrect thread count reporting on zen based processors #3

Closed Edward-Bakker closed 3 years ago

Edward-Bakker commented 3 years ago

This fixes issue #2

What this does:

pagefault commented 3 years ago

Some reference material from AMD on this matter:

https://gpuopen.com/learn/cpu-core-count-detection-windows/

https://gpuopen.com/wp-content/uploads/2018/05/gdc_2018_sponsored_optimizing_for_ryzen.pdf (see page 25)

Edward-Bakker commented 3 years ago

Some reference material from AMD on this matter:

https://gpuopen.com/learn/cpu-core-count-detection-windows/

https://gpuopen.com/wp-content/uploads/2018/05/gdc_2018_sponsored_optimizing_for_ryzen.pdf (see page 25)

Not sure if this was directed at me, but a function that's called getDefaultThreadCount() should not return the value of the number of cores in my opinion. If a developer notices they get worse performance with SMT enabled they should adapt the code to fit their product.

But generally speaking SMT improves the performance of a product, thus should not be disabled by default.

upiter commented 3 years ago

@samklop you just removed code that checks result of getCpuidFamily. Your code always returns count = logical which is totally not correct for many legacy CPU Families.

kasper93 commented 3 years ago

Your code always returns count = logical which is totally not correct for many legacy CPU Families.

And how possibly it may be "not correct"? Also this sample is not ready to be used "as is", see below.

Some reference material from AMD on this matter:

vast majority of multithreaded games and applications work and scale really well when managing an active thread pool up to the number of logical cores that the processor supports.

a small number of games is that driving a hardware thread pool with more than the number of physical cores can reduce performance

I fully understand why this guidance and sample was created. But as stated "Remember to profile!", this sample is just an example code that should never be used "as is" in production.

It doesn't help that this sample was poorly written, in 2017 with Zen(1) in mind. But the check is only for "Bulldozer" family, and everything else is implicitly treated like Zen(1) with the guidance linked above. Now if you run this code 10 years from now on brand new architecture it will still apply guidance from Zen1 era (and won't use 4-way SMT that we may have). Or even now, does this still holds in Zen3, one may ask?

One cannot expect anything from the future architectures and such optimizations like affinity/thread count need to be done per application and per specific cpu family in mind. Generally using logical core count is universal way and any fine-tuning need to be profiled and not blindly applied.

That said this PR is obviously wrong, because main point of this sample code is to show how to select different thread count based on CPU family and you removed it. It is developer job to apply the code to his product properly to maximize performance.

tannisroot commented 3 years ago

I think GPUOpen overestimates the mental capabilities of game developers if they truly expect all of them to profile this stuff, since even a AAA game developer like CDPR didn't bother to do it. Plus it honestly makes no sense to default this to the behavior that will affect a very small number of applications (which even this article states https://gpuopen.com/learn/cpu-core-count-detection-windows/).

baryluk commented 3 years ago

Please just remove this repository. It is utterly bad.

1) It is windows specific. 2) It is slow. It should instead use atomics to initialize value once. (using cpuid + strcmp in potentially hot code is really bad). 3) There is no way to override it, i.e. using environment variables. (overriding using environment variable would be beneficial for quick profiling using same binary without recompiling, as well give end-users ability to bypass mistakes or wrong assumptions of the game dev). 4) There is no documentation. 5) API semantics is undefined. 6) It is unclear if the function in question is supposed to return number of threads on CPU, or number of threads that application should use by default. These are two different questions. 7) Library can't make decisions on number of threads to use, as this is purely dependent on workload. I.e. highly vectorized code that does a lot of memory streaming, will easily saturate execution units and memory bandwidth, and nicely utilize caches, and using SMT would actually be harmful due to data and instruction cache, TLB, prefetcher and branch predictor trashing. While highly irregular scalar code with a lot of random loads would benefit from SMT, to hide memory latencies and other stalls. 8) It disables game developer thinking. 9) It is short sighted and not updated in years, thus unmaintained. 10) It is not a library, just a toy program.

Thank you.

creker commented 3 years ago

@kasper93

That said this PR is obviously wrong, because main point of this sample code is to show how to select different thread count based on CPU family and you removed it. It is developer job to apply the code to his product properly to maximize performance.

The main point of this code, according to readme and its name, is to detect logical and physical core counts. It does just that. The problem is that it then makes an arbitrary decision to treat some AMD CPUs differently. What this sample code should do is just return the number of cores and that's it. Or at the very least always return logical core count as default thread count because that's the only safe default. Everything else is harmful and misleading. It's naive to think that developers would test code before copying it. Especially when it requires complex testing on different hardware. CDPR is just one recent example of that.

The PR is definitely correct in that it removes the arbitrary logic and just makes the code do what it supposed to be doing - return core counts.

baryluk commented 3 years ago

@kasper93

That said this PR is obviously wrong, because main point of this sample code is to show how to select different thread count based on CPU family and you removed it. It is developer job to apply the code to his product properly to maximize performance.

The main point of this code, according to readme and its name, is to detect logical and physical core counts. It does just that. The problem is that it then makes an arbitrary decision to treat some AMD CPUs differently. What this sample code should do is just return the number of cores and that's it. Or at the very least always return logical core count as default thread count because that's the only safe default. Everything else is harmful and misleading. It's naive to think that developers would test code before copying it. Especially when it requires complex testing on different hardware. CDPR is just one recent example of that.

The PR is definitely correct in that it removes the arbitrary logic and just makes the code do what it supposed to be doing - return core counts.

While I understand the sentiment, what you are saying is not exactly correct.

The code has separate value for logical ("threads"), physical ("cores"), and default threads count (which it does one or another depending what it thinks). the "default thread count" is not number of SMT "threads" available on CPU (or CPUs), but rather a number of operating system threads to use on this CPU by default, i.e. for tasks pool queues, etc. There definitively is a place for such functions, but as I mentioned it before it is highly dependent on CPU, workloads, scalability / locking issues, and many more, plus it must have a way of overriding by user from outside of the program.

The sample does already return just cores and just threads. The function in question does a 3rd thing, which is something else.

This repo should be just removed. It doesn't serve any useful purpose to anybody.

A way better approach is to make a blog post, with some examples and a discussion.

SekiBetu commented 3 years ago

can't imagine how many applications and games be limited on AMD's cpu only because of this mistakeđź‘€

bscout9956 commented 3 years ago

can't imagine how many applications and games be limited on AMD's cpu only because of this mistakeđź‘€

Simple, fire up any application that makes use of GPUOpen libraries and check whether it's not accounting for "SMT threads" aka the second thread for every core.

kasper93 commented 3 years ago

This repo should be just removed. It doesn't serve any useful purpose to anybody.

I totally agree. In my response I was kind of trying to understand (and explain) why this repo even exists, but it should never be used in production as copy-paste snippet.

As a appendix to blog post it is fine, shouldn't be a github repo, but an attachment to said blog post at best.

But guys, it all doesn't matter. There are loads of bad code over the net. Just don't copy it to your application. Think before. It is unfortunate that it's branded under GPUOpen libraries, but what can you do...

SekiBetu commented 3 years ago

@amdmarco @rys hey, sorry to bother you guys, but it's been 3 months, can someone come here to solve this problem? it's just a simple change to the code.

rys commented 3 years ago

The issue with Cyberpunk caused us to revisit what this sample code recommends to developers and we have been addressing it, both with work with developers who have already used the code in production, and in a future update to this code where we want to make it clearer how to work with it and understand what advice it gives.

So we've been working on it, and I'll chase up where we are with a public update. That means we won't merge in this pull request, but we are working on it.