Closed oysteinkrog closed 1 year ago
Screenshot of VS2022 showing the dump at the deadlock site:
Hey, thank you for the report.
I've been slowly rewriting this library to be more flexible but I'm probably going to make a simple update if I get ADL to work. Do you have any test systems of ADL? I don't have a hybrid system myself. I'm a bit curious if Lakefield just simply never hit many customers so it didn't expose bugs like this or if it's just ADL that does some fuckery.
Hey, can you test with v5.0.1? I think it should work, and I've added a guard to stop after iterating for int.MaxValue
so at the very least the freeze should be gone.
I'm still not quite sure what was wrong with the code though, in theory at some point the Intel CPU should return 0
as a sign that there's no more Cores to iterate through, and yet it somehow didn't I guess. I doubt it's an errata mind you, but it's weird nonetheless.
@L3tum Thanks! We will test on the affected customers system as soon as possible (we need to arrange for remote support session etc). I will update you on the results :)
@L3tum Our support team was able to test 5.0.1 on the affected customers system, and now it no longer deadlocks, but it takes a long time for our application to start. We have not had a chance to grab another dump yet, but I suspect the reason for the slow start is probably that it iterates through the entire int.MaxValue range (maxIterations variable). Did you consider a lower value for maxIterations?
Thanks for the reply. I did consider a lower limit, but was hesitant as it essentially introduces a Y2K (etc.) dependency into it. That said, I don't think that there's probably going to be single-socket systems with that many cores. I think 1024 or something would be a good limit as well.
It's somewhat weird still that the code as is doesn't quit beforehand. I've read through the Intel Manual again and they specifically say this is the way (checking for type
to be 0
). It's also weird that it takes a long time, considering it's 2 instructions and an if
-check (plus a bit of boilerplate). Can you give me a timeframe of how long the startup took?
Also, I'll be doing a 5.0.2 release later today or this week and limit it to a more reasonable 1024 or some such.
@oysteinkrog I've released 5.0.2 now, limiting the iteration count to 256. I'm probably going to be reworking the library a bit, because the current model just simply can't really work with heterogeneous architectures. Hopefully this would fix the underlying issues
We use your library in our commercial application. Recently we received a bug report that the application was freezing on startup. In our investigation we found that the process is stuck in the IntelInformationProvider.GetNumberOfPhysicalCores() method: https://github.com/L3tum/HardwareInformation/blob/c3fbe44d09bd52b219db2dd4efe6c181205e2ff5/HardwareInformation/Providers/IntelInformationProvider.cs#L143 The customer is using a very recent CPU (Intel Alder Lake), so I am wondering if it is possible that there is a bug in this method that is now only exposed due to the new CPU? Please see the attached CPU-Z report: DESKTOP-63MBB0H.txt I can provide a dump of the process if needed :) (Please contact me at oystein@initialforce.com)