OpenHFT / Java-Thread-Affinity

Bind a java thread to a given core
http://chronicle.software/products/thread-affinity/
Apache License 2.0
1.78k stars 361 forks source link

More than 64 CPUs #28

Closed janmoller closed 8 years ago

janmoller commented 8 years ago

Have a 4 socket system (12 cores on each socket) running Linux. Total 48 cores / 96 CPUs with hyper threading and hit:

java.nio.BufferOverflowException
    at java.nio.Buffer.nextPutIndex(Buffer.java:519)
    at java.nio.HeapByteBuffer.putLong(HeapByteBuffer.java:417)
    at net.openhft.affinity.impl.LinuxJNAAffinity.getAffinity(LinuxJNAAffinity.java:80)
    at net.openhft.affinity.impl.LinuxJNAAffinity.<clinit>(LinuxJNAAffinity.java:48)

Found the cause in the code in LinuxJNAAffinity.java: // TODO: FIXME!!! CHANGE IAffinity TO SUPPORT PLATFORMS WITH 64+ CORES FIXME!!! I guess that 64+ CPUs is not that far in the future to be the norm. A fix would be highly appreciated.

plusterkopp commented 8 years ago

Actually, I have a fix for this for Windows, with hardware introspection to determine CPU layout using Windows API calls. Some Interfaces need to be extended though.

peter-lawrey commented 8 years ago

The code should work in 3.0.4-SNAPSHOT. Are you able to check that it does? We don't have a >64 CPU machine to test it on.

On 23 March 2016 at 12:13, Jan Moller notifications@github.com wrote:

Have a 4 socket system (12 cores on each socket) running Linux. Total 48 cores / 96 CPUs with hyper threading and hit:

java.nio.BufferOverflowException at java.nio.Buffer.nextPutIndex(Buffer.java:519) at java.nio.HeapByteBuffer.putLong(HeapByteBuffer.java:417) at net.openhft.affinity.impl.LinuxJNAAffinity.getAffinity(LinuxJNAAffinity.java:80) at net.openhft.affinity.impl.LinuxJNAAffinity.(LinuxJNAAffinity.java:48)

Found the cause in the code in LinuxJNAAffinity.java: // TODO: FIXME!!! CHANGE IAffinity TO SUPPORT PLATFORMS WITH 64+ CORES FIXME!!!

I guess that 64+ CPUs is not that far in the future to be the norm. A fix would be highly appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/OpenHFT/Java-Thread-Affinity/issues/28

plusterkopp commented 8 years ago

I can only test for Windows, with 80 cpus, there it works.

janmoller commented 8 years ago

Sounds great. Will check right after easter.

janmoller commented 8 years ago

Cloned the master branch to a Linux box, did a 'mvn package' and got:

-----
[INFO] Building jar: /tmp/Java-Thread-Affinity/affinity/target/affinity-3.0.4-SNAPSHOT-javadoc.jar
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building OpenHFT/Java-Thread-Affinity/affinity-test 3.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for net.openhft:affinity:jar:3.0.1-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] OpenHFT/Java-Thread-Affinity/affinity .............. SUCCESS [  5.861 s]
[INFO] OpenHFT/Java-Thread-Affinity/affinity-test ......... FAILURE [  0.061 s]
[INFO] Java Affinity Parent ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.526 s
[INFO] Finished at: 2016-03-27T08:21:33+02:00
[INFO] Final Memory: 28M/945M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project affinity-test: Could not resolve dependencies for project net.openhft:affinity-test:bundle:3.0.1-SNAPSHOT: Failure to find net.openhft:affinity:jar:3.0.1-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of Snapshot Repository has elapsed or updates are forced -> [Help 1]
-----

So it appears that affinity built fine, but affinity-test failed. I wonder why the test refers to "3.0.1-SNAPSHOT" when building "3.0.4-SNAPSHOT" . I am afraid I do not know a lot about maven.

peter-lawrey commented 8 years ago

Hello Jan, You can built the affinity sub-module stand alone, but you are right this should have been the same version. I have fixed it now.

Regards, Peter.

On 27 March 2016 at 07:30, Jan Moller notifications@github.com wrote:

Cloned the master branch to a Linux box, did a 'mvn package' and got:


[INFO] Building jar: /tmp/Java-Thread-Affinity/affinity/target/affinity-3.0.4-SNAPSHOT-javadoc.jar [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building OpenHFT/Java-Thread-Affinity/affinity-test 3.0.1-SNAPSHOT [INFO] ------------------------------------------------------------------------ [WARNING] The POM for net.openhft:affinity:jar:3.0.1-SNAPSHOT is missing, no dependency information available [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] OpenHFT/Java-Thread-Affinity/affinity .............. SUCCESS [ 5.861 s] [INFO] OpenHFT/Java-Thread-Affinity/affinity-test ......... FAILURE [ 0.061 s] [INFO] Java Affinity Parent ............................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 6.526 s [INFO] Finished at: 2016-03-27T08:21:33+02:00 [INFO] Final Memory: 28M/945M [INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal on project affinity-test: Could not resolve dependencies for project net.openhft:affinity-test:bundle:3.0.1-SNAPSHOT: Failure to find net.openhft:affinity:jar:3.0.1-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of Snapshot Repository has elapsed or updates are forced -> [Help 1]

So it appears that affinity built fine, but affinity-test failed. I wonder why the test refers to "3.0.1-SNAPSHOT" when building "3.0.4-SNAPSHOT" . I am afraid I do not know a lot about maven.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/OpenHFT/Java-Thread-Affinity/issues/28#issuecomment-202000734

janmoller commented 8 years ago

Since I am using gradle I rely on the OpenHFT package from maven central. I am afraid that I don't know maven well enough to get the plumbing right with a mvn project embedded directly inside a gradle project...

However, if I could build the complete OpenHFT package with maven (including tests) on my box then I guess everything is perfect, and I can wait until 3.0.4 is released and include it in my gradle project.

So, when I clone the github project and run 'mvn package' i get this:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
[main] INFO org.ops4j.pax.exam.spi.DefaultExamSystem - Pax Exam System (Version: 4.8.0) created.
[main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - creating PaxExam runner for class net.openhft.affinity.osgi.OSGiBundleTest
Running net.openhft.affinity.osgi.OSGiBundleTest
[main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - creating PaxExam runner for class net.openhft.affinity.osgi.OSGiBundleTest
[main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - running test class net.openhft.affinity.osgi.OSGiBundleTest
[main] WARN org.ops4j.pax.url.mvn.internal.AetherBasedResolver - Error resolving artifactnet.openhft:affinity:jar:3.0.4-SNAPSHOT:Could not find artifact net.openhft:affinity:jar:3.0.4-SNAPSHOT
shaded.org.eclipse.aether.resolution.ArtifactResolutionException: Could not find artifact net.openhft:affinity:jar:3.0.4-SNAPSHOT
    at shaded.org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:444)
    at shaded.org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:246)

It appears that the tests can'r run because the 'net.openhft:affinity:jar:3.0.4-SNAPSHOT' artifact can't be found.

peter-lawrey commented 8 years ago

Ok, I will do a release today.

On 27 March 2016 at 09:55, Jan Moller notifications@github.com wrote:

Since I am using gradle I rely on the OpenHFT package from maven central. I am afraid that I don't know maven well enough to get the plumbing right with a mvn project embedded directly inside a gradle project...

However, if I could build the complete OpenHFT package with maven (including tests) on my box then I guess everything is perfect, and I can wait until 3.0.4 is released and include it in my gradle project.

So, when I clone the github project and run 'mvn package' i get this:


T E S T S

[main] INFO org.ops4j.pax.exam.spi.DefaultExamSystem - Pax Exam System (Version: 4.8.0) created. [main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - creating PaxExam runner for class net.openhft.affinity.osgi.OSGiBundleTest Running net.openhft.affinity.osgi.OSGiBundleTest [main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - creating PaxExam runner for class net.openhft.affinity.osgi.OSGiBundleTest [main] INFO org.ops4j.pax.exam.junit.impl.ProbeRunner - running test class net.openhft.affinity.osgi.OSGiBundleTest [main] WARN org.ops4j.pax.url.mvn.internal.AetherBasedResolver - Error resolving artifactnet.openhft:affinity:jar:3.0.4-SNAPSHOT:Could not find artifact net.openhft:affinity:jar:3.0.4-SNAPSHOT shaded.org.eclipse.aether.resolution.ArtifactResolutionException: Could not find artifact net.openhft:affinity:jar:3.0.4-SNAPSHOT at shaded.org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:444) at shaded.org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:246)

It appears that the tests can'r run because the 'net.openhft:affinity:jar:3.0.4-SNAPSHOT' artifact can't be found.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/OpenHFT/Java-Thread-Affinity/issues/28#issuecomment-202017206

janmoller commented 8 years ago

Affinity.setAffinity(int cpu) and Affinity.getThreadId() verified to work on a 96 CPU system running CentOS Linux & Java-Thread-Affinity 3.0.4. Thanks.

peter-lawrey commented 8 years ago

Excellent news.

To get best results,I suggest using isolcpus= on boot up, configure IRQBlanace to not use those CPUS and busy wait.

Regards, Peter.

On 27 March 2016 at 18:25, Jan Moller notifications@github.com wrote:

Affinity.setAffinity(int cpu) and Affinity.getThreadId() verified to work on a 96 CPU system running CentOS Linux & Java-Thread-Affinity 3.0.4. Thanks.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/OpenHFT/Java-Thread-Affinity/issues/28#issuecomment-202109707

paxel commented 5 years ago

mmh. I'm using 3.1.11 and get problems using the AffinityThreadFactory.

The FileLockBasedLockChecker has an Array with with locks the size of VanillaCpuLayout.MAX_CPUS_SUPPORTED.

What am I doing wrong? 2 numa nodes, 72 CPUs

plusterkopp commented 5 years ago

Am 03.04.2019 um 15:22 schrieb paxel:

mmh. I'm using 3.1.11 and get problems using the AffinityThreadFactory.

The FileLockBasedLockChecker has an Array with with locks the size of VanillaCpuLayout.MAX_CPUS_SUPPORTED.

What am I doing wrong? 2 numa nodes, 72 CPUs

I started a fork of this project in 2015 when I found that it was lacking introspection support on Windows. I then used the Windows API to get the CPU layout. This API also provides information about CPU groups (needed for > 64 lCPUs), NUMA nodes and all cache levels.

Since I actually wanted to reduce cache misses more than lock CPUs to cores, this was what I needed. I also wrote an AffinityManager class to facilitate binding threads to not just cores, but also sockets, NUMA nodes and now caches.

I think it was in 2015 when I wrote Peter Lawrey about it, but I got no reply. It would have been nice to merge my code back and to bring it up to their code conventions but I guess there was bo need due to differing design objectives.

Unfortunately, I also never tried to merge any changes from the original project and am now stuck with Java 8 because of BootClassPath.

So in early 2017 I got my hands on a 2x18x2 machine, just like yours that I can dual boot in Windows and CentOS. I then did some additions to LinuxJNAAffinity (and other places) and I can say it works on all lCPUs, under Windows and Linux.

If anyone is interested, go to https://github.com/plusterkopp/Java-Thread-Affinity. But beware, there is no documentation (since I am still the only user) and I could use some help with POMs and such. But it does build in Maven.