SakalisC / Splash-3

The Splash-3 benchmark suite
41 stars 26 forks source link

The program does not execute. #6

Closed hckuo2 closed 6 years ago

hckuo2 commented 6 years ago

I followed the instructions and run with the recommended parameters. The program seems to execute and exit normally. But the result shows this. The compute time is always zero. Is this normal?

gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010 m4 (GNU M4) 1.4.17


        Hack code: Plummer model

     nbody     dtime       eps       tol     dtout     tstop    fcells     NPROC
     16384   0.02500    0.0500      1.00     0.250     0.075      2.00         4

COMPUTESTART  =   1535169466
COMPUTEEND    =   1535169466
COMPUTETIME   =            0
TRACKTIME     =            0
PARTITIONTIME =            0     -nan
TREEBUILDTIME =            0     -nan
FORCECALCTIME =            0     -nan
RESTTIME      =            0     -nan
Creating a two cluster, non uniform distribution for 16384 particles
Starting FMM with 4 processors
Finished FMM
                                   PROCESS STATISTICS
             Track        Tree        List        Part        Pass       Inter        Bar        Intra       Other
 Proc        Time         Time        Time        Time        Time       Time         Time       Time        Time
    0            0           0           0           0           0           0           0           0           0

                                   TIMING INFORMATION
Start time                        :       1535169466
Initialization finish time        :       1535169466
Overall finish time               :       1535169466
Total time with initialization    :                0
Total time without initialization :                0

Total time for steps 3 to 5 :            0

Ocean simulation with W-cycle multigrid solver
    Processors                         : 4
    Grid size                          : 258 x 258
    Grid resolution (meters)           : 20000.00
    Time between relaxations (seconds) : 28800
    Error tolerance                    : 1e-07

                       PROCESS STATISTICS
                  Total          Multigrid         Multigrid
 Proc             Time             Time            Fraction
    0                 0                  0              -nan

                       TIMING INFORMATION
Start time                        :       1535169466
Initialization finish time        :       1535169466
Overall finish time               :       1535169466
Total time with initialization    :                0
Total time without initialization :                0
    (excludes first timestep)

Ocean simulation with W-cycle multigrid solver
    Processors                         : 4
    Grid size                          : 258 x 258
    Grid resolution (meters)           : 20000.00
    Time between relaxations (seconds) : 28800
    Error tolerance                    : 1e-07

                       PROCESS STATISTICS
                  Total          Multigrid         Multigrid
 Proc             Time             Time            Fraction
    0                 1                  1             1.000

                       TIMING INFORMATION
Start time                        :       1535169466
Initialization finish time        :       1535169466
Overall finish time               :       1535169467
Total time with initialization    :                1
Total time without initialization :                1
    (excludes first timestep)

TIMING STATISTICS MEASURED BY MAIN PROCESS:
    Overall start time            1535169467
    Overall end time              1535169467
    Total time with initialization                     0
    Total time without initialization                      0
Rasiosity Statistics

    Histogram of interactions/elem
     Interactions  Occurrence
     -------------------------------
     (Over 100)      168 (126291.523438)
        100          3 (4874.575684)
        98          6 (1521.356079)
        97          5 (1734.110107)
        96          5 (1352.056885)
        94          2 (2449.900391)
        93          1 (9905.465820)
        92          2 (50253.605469)
        91          1 (22364.552734)
        90          7 (20320.214844)
        89          1 (1195.967529)
        88          2 (9905.674805)
        87          1 (1311.672607)
        86          4 (3626.220459)
        85          4 (1450.736816)
        84          5 (4989.483398)
        83          1 (1512.704102)
        82          4 (81670.343750)
        81          4 (6313.362793)
        80          1 (13567.184570)
        79          1 (1191.440430)
        78          1 (72000.250000)
        77          2 (1345.461182)
        76          1 (28252.785156)
        75          4 (23670.160156)
        72          1 (2930.986328)
        71          2 (8949.666992)
        70          3 (11491.526367)
        69          1 (28252.785156)
        67          1 (29830.343750)
        66          1 (22372.632812)
        65          3 (8065.339844)
        64          4 (17389.933594)
        63          2 (7623.680664)
        62          3 (23875.185547)
        61          1 (22680.052734)
        60          1 (36381.175781)
        59          4 (8072.653320)
        57          4 (3139.145508)
        54          1 (21323.609375)
        52          2 (8099.991699)
        49          1 (4661.109863)
        46          1 (4661.109863)
        45          2 (71999.750000)
        44          4 (67961.914062)
        41          3 (4641.196777)
        39          1 (42452.933594)
        38          2 (3360.855957)
        37          3 (15079.329102)
        36          1 (5919.844727)
        35          4 (2469.837891)
        34          1 (42452.933594)
        33          2 (19543.990234)
        32          1 (3889.380859)
        30          1 (14858.612305)
        29          1 (3889.380859)
        28          6 (16926.972656)
        27          3 (7069.240723)
        26          3 (9905.535156)
        25          1 (3241.141113)
        24          2 (14858.602539)
        22          2 (1904.186157)
        21          1 (9905.674805)
        20          3 (42897.675781)
        13          1 (360186.750000)
        12          18 (5481.369141)
        11          1 (127359.734375)
        10          24 (3665.947998)
        8          1 (360186.750000)
        7          1 (360186.750000)
        6          12 (68938.164062)
        5          4 (45358.750000)
        4          266 (6870.366211)
        3          154 (18073.607422)
        2          926 (10214.654297)
        1          558 (12964.450195)
        0          403 (38763.160156)
    Configurations
    Patch assignment: Static equal number
    Always inserting at top of list for visibility testing (not sorted)
    Recursive pruning enabled for BSP tree traversal
    Patch cache:      Enabled
    Always check all other queues when task stealing (not neighbor scheme)
    Parameters
    Number of processors:    4
    Number of task queues:   4
    Number of tasks / queue: 200
    Area epsilon:            5000.000000
    #inter parallel refine:  5
    #visibility comp / task: 4
    BF epsilon:              0.100000
    Energy convergence:      0.050000
    Iterations to converge:   3 times
    Resource Usage
    Number of patches:            364
    Total number of elements:     2688
    Total number of interactions: 39973
              completely visible: 6656
            completely invisible: 13166
               partially visible: 20151
    Interaction coherence (root interaction not counted)
           Common for 4 siblings: 3212
           Common for 3 siblings: 396
           Common for 2 siblings: 246
           Common for no sibling: 192
    Avg. elements per patch:      7.4
    Avg. interactions per patch:  109.8
    Avg. interactions per element:14.9
    Number of elements in equivalent uniform mesh: 7783
    Elem(hierarchical)/Elem(uniform): 34.54%

Number of processors:       4
Global shared memory size:  64 MB
Samples per pixel:          1

Number of primitive objects:    7629
Number of primitive elements:   46423

****** Hierarchial uniform grid memory allocation summary ******* 

     < struct >:            < current >   < maximum >    < sizeof > 
     <  bytes >:             <  bytes >   <   bytes >    <  bytes > 

     grid:                      59760         59760           144 
     hashtable entries:        678968        678968             8 
     emptycell entries:          6632          6632             8 
     voxel:                   1251480       1251480            40 
     bintree_node:           12370320      12370320           120 

     Totals:                 14367160      14367160      

TIMING STATISTICS MEASURED BY MAIN PROCESS:
        Overall start time               1535169467
        Overall end time             1535169467
        Total time with initialization                     0
        Total time without initialization                     0
usage:  VOLREND num_processes input_file ROTATE_STEPS
Using 4 procs on 3 steps of 512 mols
Other parameters:
    TSTEP = 1.50e-16
    NORDER = 6
    NSAVE = -1
    NRST = 3000
    NPRINT = 3
    NFMC = 0
    CUTOFF = 6.212752

TEMPERATURE                =   298.00 K
DENSITY                    =  0.99800 G/C.C.
NUMBER OF MOLECULES        =      512
NUMBER OF PROCESSORS       =        4
TIME STEP                  = 1.50e-01 SEC
ORDER USED TO SOLVE F=MA   =        6 
NO. OF TIME STEPS          =        3 
FREQUENCY OF DATA SAVING   =       -1 
FREQUENCY TO WRITE RST FILE=     3000 
SPHERICAL CUTOFF RADIUS    =   6.2128 ANGSTROM

NS = 7.9999899999999995
BOXL =  24.851010
CUTOFF =   6.212752
XS =   3.106380
ZERO = 1.55319
WCOS = 0.585882
WSIN = 0.756950
***** NEW RUN STARTING FROM REGULAR LATTICE *****
         3        1.57495      0.05127     10.55761                      -2.15831
           10.026        305.74022        -19.57198
COMPUTESTART (after initialization) = 1535169467
COMPUTEEND = 1535169467
COMPUTETIME (after initialization) = 0
Measured Time (2nd timestep onward) = 0
Intramolecular time only (2nd timestep onward) = 0
Intermolecular time only (2nd timestep onward) = 0
Other time (2nd timestep onward) = 0

Exited Happily with XTT = 10.0255 (note: XTT value is garbage if NPRINT > NSTEP)
Using 4 procs on 3 steps of 512 mols
Other parameters:
    TSTEP = 1.50e-16
    NORDER = 6
    NSAVE = -1
    NRST = 3000
    NPRINT = 3
    NFMC = 0
    CUTOFF = 6.212752

64 boxes with 4 processors

TEMPERATURE                =   298.00 K
DENSITY                    =  0.99800 G/C.C.
NUMBER OF MOLECULES        =      512
NUMBER OF PROCESSORS       =        4
TIME STEP                  = 1.50e-01 SEC
ORDER USED TO SOLVE F=MA   =        6 
NO. OF TIME STEPS          =        3 
FREQUENCY OF DATA SAVING   =       -1 
FREQUENCY TO WRITE RST FILE=     3000 
xprocs = 1  yprocs = 2  zprocs = 2
x_inc = 4    y_inc = 2   z_inc = 2
x_left = 0   y_left = 0  z_left = 0
SPHERICAL CUTOFF RADIUS    =   6.2128 ANGSTROM

NS = 7.9999999999999893
BOXL =  24.851010
CUTOFF =   6.212752
BOX_LENGTH =   6.212752
BOX_PER_SIDE = 4
XS =   3.106376
ZERO = 1.55319
WCOS = 0.585882
WSIN = 0.756950
***** NEW RUN STARTING FROM REGULAR LATTICE *****
         3     4711.30613   1586.02005      9.16081     -1.85845 
         6304.629    1430180.83630        414.12462
COMPUTESTART (after initialization) = 1535169467
COMPUTEEND = 1535169467
COMPUTETIME (after initialization) = 0
Measured Time (2nd timestep onward) = 0
Intramolecular time only (2nd timestep onward) = 0
Intermolecular time only (2nd timestep onward) = 0
Other time (2nd timestep onward) = 0

Exited Happily with XTT = 6304.63 (note: XTT value is garbage if NPRINT > NSTEP)

Sparse Cholesky Factorization
     Problem:         
     4 Processors
     Postpass partition size: 32
     16384 byte cache

true partitions
Fan-out, no block copy-across
LB domain, embedded distribution
No ordering
1295 supers, 3.05 nodes/super, 211 max super
1295/531 supers before/after
165039042/170264150 (1.03) ops before/after amalgamation
before partition
Divide for 4 P, 17 domains, 0.43 of work static, 0.95 eff, (inf overall)
284946 total domain updates
970 max height, 170264150 ops, 58510.02 conc, 120.94 bl for 4 P
Target partition size 0, postpass size 32
Processor array is 2 by 2
No redistribution
Supers: 69: 1  85: 1  104: 1  111: 1  137: 1  142: 1  396: 1  
Blocks: 27: 1  28: 5  29: 1  33: 12  34: 5  35: 6  36: 2  
32 partitions
32 partitions, 493 blocks
170264150 operations for factorization

                            PROCESS STATISTICS
              Total
 Proc         Time 
    0              
                            TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0

FFT with Blocking Transpose
   65536 Complex Doubles
   4 Processors
   65536 Cache lines
   16 Byte line size
   4096 Bytes per page

iter_num = 64
iter_num = 64
iter_num = 64
iter_num = 64
Transpose: iter_num = 0
Transpose: iter_num = 4096
Transpose: iter_num = 8192
FFt1DOnce: iter_num = 1024
Transpose: iter_num = 12288
Step 1:        0
Step 2:        0
Transpose: iter_num = 4096
Transpose: iter_num = 0
Transpose: iter_num = 8192
Transpose: iter_num = 12288
Step 3:        0
Transpose: iter_num = 0
Transpose: iter_num = 4096
Step 4:        0
Transpose: iter_num = 8192
Transpose: iter_num = 12288
Step 5:        0

                 PROCESS STATISTICS
            Computation      Transpose     Transpose
 Proc          Time            Time        Fraction
    0                 0              0          -nan

                 TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0
Overall transpose time            :                0
Overall transpose fraction        :             -nan

Blocked Dense LU Factorization
     512 by 512 Matrix
     4 Processors
     16 by 16 Element Blocks

                            PROCESS STATISTICS
              Total      Diagonal     Perimeter      Interior       Barrier
 Proc         Time         Time         Time           Time          Time
    0             0             0             0             0             0

                            TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0

Blocked Dense LU Factorization
     512 by 512 Matrix
     4 Processors
     16 by 16 Element Blocks

                            PROCESS STATISTICS
              Total      Diagonal     Perimeter      Interior       Barrier
 Proc         Time         Time         Time           Time          Time
    0             0             0             0             0             0

                            TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0

Integer Radix Sort
     1048576 Keys
     4 Processors
     Radix = 1024
     Max key = 67108864

                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0              0               0               0

                 TIMING INFORMATION
Start time                        :       1535169467
Initialization finish time        :       1535169467
Overall finish time               :       1535169467
Total time with initialization    :                0
Total time without initialization :                0
SakalisC commented 6 years ago

Hi,

Most (if not all) of the Splash benchmarks count the time in seconds. What seems to be happening here is that the benchmark finishes in less than a second, hence why the execution time is 0. This, in turn, happens because the input is small. Some benchmarks have bigger inputs that can last more than one seconds, but not all of them. Since Splash-3 is used a lot with simulators, the recommended inputs are aimed at simulation, not native execution, and are not generally very big.

Does this solve your problem?

Regards,

Chris

hckuo2 commented 6 years ago

YES thanks