PlummersSoftwareLLC / Primes

Prime Number Projects in C#/C++/Python
https://plummerssoftwarellc.github.io/PrimeView/
2.44k stars 575 forks source link

Ruby solution takes > 4 hours to complete. #907

Closed EDToaster closed 1 year ago

EDToaster commented 1 year ago

I haven't debugged this myself, but Ruby solution 2 (darnellbrawner-MultiThreaded-Numo_2core) seems to take > 4 hours to complete on systems with high core counts (not confirmed this is the root cause).

See this run on 16 physical cores: link

Same thing happens on my system with 48 physical and 96 logical cores.

Here's the output of my lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  96
  On-line CPU(s) list:   0-95
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  24
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4999.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pn
                         i pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid m
                         px avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
rbergen commented 1 year ago

@EDToaster Good catch, I never noticed this! That's probably because the issue does not occur on my own machines with a lower core count.

@darnellbrawner As the contributor of the solution in question, could you maybe lend your perspective on this?

EDToaster commented 1 year ago

Something strange -- if I comment out the other MT implementation (with Ractor), the program exits properly after 5 seconds. Maybe it's an implementation issue with Ractor?

rbergen commented 1 year ago

Could be. I know for sure I'm not able to address the issue in either case. I also know that a 4-hour run for a single implementation is not sustainable.

I'll allow @darnellbrawner some time to respond and address this. If that doesn't lead to resolution of this issue, I'll take the Ractor implementation out of the automated benchmark runs, for now.

darnellbrawner commented 1 year ago

4-hour oof ill take a look. I don't have access to a large core count machine but let me see what I can find.

rbergen commented 1 year ago

@EDToaster Would it be possible for you to verify if the change proposed in #915 addresses the issue on your high core count machine?

EDToaster commented 1 year ago

Confirmed that the program does terminate under a minute, but the MT performance is not the best

❯ ruby --jit -W0 prime.rb
darnellbrawner-Numo;829;5.005;1;algorithm=base,faithful=no
darnellbrawner-MultiThreaded;1;19.638;96;algorithm=base,faithful=yes
darnellbrawner-MultiThreaded-Numo_2core;1;26.375;2;algorithm=base,faithful=no
rbergen commented 1 year ago

but the MT performance is not the best

That's a good finding that someone may want to investigate and improve on, but my conclusion is that the problem that is the subject of this issue will be fixed with the merge of #915.

EDToaster commented 1 year ago

but my conclusion is that the problem that is the subject of this issue will be fixed

yup.

rbergen commented 1 year ago

Closing after merging of #915.