Open subedika opened 3 years ago
I would like to note that I too have received this sort of issue. My lab is attempting to run the program to generate global simulations with NEX_XI and NEX_ETA params at a value of 256. I am compiling the program to run with a set of NVidia RTX 2080s. While I am aware there is a parameter called MEMORY_INSTALLED_PER_CORE_IN_GB and the related PERCENT_OF_MEM_TO_USE_PER_CORE, assigning different values to these doesn't actually affect the performance of the program. Each GPU thread only seems to use about 150MB of memory on the GPUs; we are running this over 24 threads. The xcreate_header_file binary suggests that we need about 160GB of memory to run the solver. We only have around 128GB of RAM available on the system.
Aside from lowering the value of the NEX_XI and NEX_ETA parameters or increasing the amount of memory available to the system, is there anything we can do to fix this memory issue, especially given that the GPUs' memory is mostly going unused?
@subedika: maybe you can add more details, e.g., attach the output files here? also, try first the most recent devel branch version to check.
@amkearns-usgs: the numbers above don't seem to match: 150MB on the GPU for 24 threads would only amount to 3.6 GB memory needed, not the 160GB mentioned to run the solver. there should be more details in the output_solver.txt files for example. what is the setup in your case, 24 GPU cards spread over 24 compute nodes? or only a single compute node with 1 GPU card and you use CUDA MPS to run all processes on the same card and node?
to use more GPU memory per process, you would lower the NPROC_XI
/NPROC_ETA
values such that the partition size gets closer to your GPU memory size. the parameter MEMORY_INSTALLED_PER_CORE_IN_GB
has no effect, it is only used for UNDO_ATTENUATION
simulations to estimate the time steps in between wavefield snapshots when SAVE_FORWARD
is set or kernel simulations with SIMULATION_TYPE = 3
are run.
anyway, add more outputs if you want to get more specific answers... :)
To be more precise, the GPU memory ussage is only ~150 MB per thread according to nvidia-smi. Main memory usage (according to htop) is multiple GB per thread. Once the program gets past ~5 GB per thread it crashes due to an out of memory error.
The system we run on has 4 RTX 2080 GPUs, and I believe it has 10 dual-thread CPU cores (exact hardware according to /proc/cpuinfo is Intel core i9-9820X, 3.30GHz).
Here is the contents of output_solver from the last attempted run of the program:
Specfem3D MPI Solver
Version: v7.0.2-421-gc4c30a79
Planet: Earth
There are 24 MPI processes Processes are numbered from 0 to 23
There are 256 elements along xi in each chunk There are 256 elements along eta in each chunk
There are 2 slices along xi in each chunk There are 2 slices along eta in each chunk There is a total of 4 slices in each chunk There are 6 chunks There is a total of 24 slices in all the chunks
NDIM = 3
NGLLX = 5 NGLLY = 5 NGLLZ = 5
using single precision for the calculations
smallest and largest possible floating-point numbers are: 1.17549435E-38 3.40282347E+38
model: s362ani incorporating the oceans using equivalent load incorporating ellipticity incorporating surface topography incorporating self-gravitation (Cowling approximation) incorporating rotation incorporating attenuation using 3 standard linear solids
incorporating 3-D lateral variations in the mantle no heterogeneities in the mantle incorporating crustal variations using one layer only in crust incorporating transverse isotropy no inner-core anisotropy no general mantle anisotropy
GPU_MODE Active. runtime : 1 platform: NVIDIA device : * GPU number of devices per node: min = 4 max = 4
creating global slice addressing
Spatial distribution of the slices 3 1 2 0
11 9 7 5 19 17
10 8 6 4 18 16
23 21
22 20
15 13
14 12
mesh databases: reading in crust/mantle databases... reading in outer core databases... reading in inner core databases... reading in coupling surface databases... reading in MPI databases... for overlapping of communications with calculations:
percentage of edge elements in crust/mantle 5.73075438 % percentage of volume elements in crust/mantle 94.2692490 %
percentage of edge elements in outer core 14.9479170 % percentage of volume elements in outer core 85.0520859 %
percentage of edge elements in inner core 14.9107141 % percentage of volume elements in inner core 85.0892868 %
Elapsed time for reading mesh in seconds = 357.150177
topography: topography/bathymetry: min/max = -7747 5507
Elapsed time for reading topo/bathy in seconds = 0.508016586
adjacency: total number of elements in this slice = 167936
using kd-tree search radius = 234.55179839086608 (km)
maximum search elements = 656 maximum of actual search elements (after distance criterion) = 655
estimated typical element size at surface = 39.091966398477680 (km) maximum distance between neighbor centers = 202.34023456793579 (km)
maximum neighbors found per element = 37 (should be 37 for globe meshes) total number of neighbors = 4256864
Elapsed time for detection of neighbors in seconds = 16.861965878168121
kd-tree: total data points: 167936 theoretical number of nodes: 335869 tree memory size: 10.2499084 MB actual number of nodes: 335871 tree memory size: 10.2499695 MB maximum depth : 22 creation timing : 6.68773651E-02 (s)
sources: 1
locating sources
source # 1
source located in slice 2 in element 157113
using moment tensor source:
xi coordinate of source in that element: -0.22106384566793172
eta coordinate of source in that element: 0.63624502320449539
gamma coordinate of source in that element: -0.52283578294249500
source time function: using (quasi) Heaviside source time function
half duration: 32.399999999999999 seconds
time shift: 0.0000000000000000 seconds
magnitude of the source: scalar moment M0 = 2.9586466500749968E+028 dyne-cm moment magnitude Mw = 8.2807287337668960
original (requested) position of the source:
latitude: 55.420000000000002
longitude: -157.31999999999999
depth: 30.199999999999999 km
position of the source that will be used:
latitude: 55.419999999999995
longitude: -157.32000000000002
depth: 30.200000000000891 km
Error in location of the source: 1.42565528E-12 km
maximum error in location of the sources: 1.42565528E-12 km
Elapsed time for detection of sources in seconds = 3.6215900378301740
End of source detection - done
receivers:
Total number of receivers = 378
locating receivers
reading receiver information...
Stations sorted by epicentral distance: Station # 120: II.KDAK epicentral distance: 3.530290 degrees Station # 29: IU.COLA epicentral distance: 10.573919 degrees Station # 20: IU.ADK epicentral distance: 12.014195 degrees Station # 162: US.EGAK epicentral distance: 12.325594 degrees Station # 202: US.WRAK epicentral distance: 14.016299 degrees Station # 25: IU.BILL epicentral distance: 20.907255 degrees Station # 191: US.NLWA epicentral distance: 22.179230 degrees Station # 30: IU.COR epicentral distance: 24.157248 degrees Station # 169: US.HAWA epicentral distance: 25.199800 degrees Station # 66: IU.PET epicentral distance: 25.532650 degrees Station # 189: US.NEW epicentral distance: 25.548803 degrees Station # 152: US.BMO epicentral distance: 27.369146 degrees Station # 56: IU.MA2 epicentral distance: 27.600475 degrees Station # 357: IW.PLID epicentral distance: 27.969683 degrees Station # 186: US.MSO epicentral distance: 28.138155 degrees Station # 204: US.WVOR epicentral distance: 28.141666 degrees Station # 355: IW.MFID epicentral distance: 29.093248 degrees Station # 350: IW.DLMT epicentral distance: 29.745358 degrees Station # 171: US.HLID epicentral distance: 29.802465 degrees Station # 163: US.EGMT epicentral distance: 29.867979 degrees Station # 153: US.BOZ epicentral distance: 30.159069 degrees Station # 61: IU.MIDW epicentral distance: 30.790148 degrees Station # 115: II.FFC epicentral distance: 30.987230 degrees Station # 164: US.ELK epicentral distance: 31.167278 degrees Station # 180: US.LKWY epicentral distance: 31.472580 degrees Station # 353: IW.IMW epicentral distance: 31.572300 degrees Station # 351: IW.FLWY epicentral distance: 31.595047 degrees Station # 352: IW.FXWY epicentral distance: 31.682159 degrees Station # 356: IW.MOOW epicentral distance: 31.774481 degrees Station # 195: US.RLMT epicentral distance: 31.785198 degrees Station # 359: IW.TPAW epicentral distance: 31.814299 degrees Station # 354: IW.LOHW epicentral distance: 31.939100 degrees Station # 358: IW.SNOW epicentral distance: 31.948198 degrees Station # 148: US.AHID epicentral distance: 32.187500 degrees Station # 178: US.LAO epicentral distance: 32.607452 degrees Station # 172: US.HWUT epicentral distance: 32.669357 degrees Station # 159: US.DGMT epicentral distance: 32.765194 degrees Station # 160: US.DUG epicentral distance: 32.902267 degrees Station # 155: US.BW06 epicentral distance: 33.064751 degrees Station # 198: US.TPNV epicentral distance: 33.273872 degrees Station # 88: IU.TIXI epicentral distance: 33.945564 degrees Station # 48: IU.KIP epicentral distance: 33.954693 degrees Station # 249: N4.K22A epicentral distance: 34.867802 degrees Station # 76: IU.RSSD epicentral distance: 35.367790 degrees Station # 281: N4.O20A epicentral distance: 35.531281 degrees Station # 69: IU.POHA epicentral distance: 35.630775 degrees Station # 132: II.PFO epicentral distance: 35.661385 degrees Station # 144: II.XPFO epicentral distance: 35.661385 degrees Station # 270: N4.MDND epicentral distance: 35.674683 degrees Station # 225: N4.E28B epicentral distance: 35.793369 degrees Station # 102: II.ALE epicentral distance: 36.103497 degrees Station # 98: IU.YAK epicentral distance: 36.571423 degrees Station # 203: US.WUAZ epicentral distance: 37.007076 degrees Station # 173: US.ISCO epicentral distance: 37.251591 degrees Station # 99: IU.YSS epicentral distance: 37.410580 degrees Station # 187: US.MVCO epicentral distance: 37.411461 degrees Station # 147: US.AGMN epicentral distance: 37.489815 degrees Station # 314: N4.SUSD epicentral distance: 38.048679 degrees Station # 192: US.OGNE epicentral distance: 38.530491 degrees Station # 229: N4.F33B epicentral distance: 38.691559 degrees Station # 197: US.SDCO epicentral distance: 38.722054 degrees Station # 250: N4.K30B epicentral distance: 38.786301 degrees Station # 256: N4.KSCO epicentral distance: 39.486588 degrees Station # 92: IU.TUC epicentral distance: 39.724506 degrees Station # 43: IU.JOHN epicentral distance: 39.734013 degrees Station # 161: US.ECSD epicentral distance: 39.810059 degrees Station # 166: US.EYMN epicentral distance: 40.079155 degrees Station # 22: IU.ANMO epicentral distance: 40.173866 degrees Station # 113: II.ERM epicentral distance: 40.240379 degrees Station # 223: N4.BGNE epicentral distance: 40.435734 degrees Station # 226: N4.E38A epicentral distance: 40.849354 degrees Station # 313: N4.SPMN epicentral distance: 41.008095 degrees Station # 257: N4.L34B epicentral distance: 41.076458 degrees Station # 156: US.CBKS epicentral distance: 41.268528 degrees Station # 237: N4.I37B epicentral distance: 41.434464 degrees Station # 299: N4.R32B epicentral distance: 42.121662 degrees Station # 272: N4.N35B epicentral distance: 42.219021 degrees Station # 232: N4.G40A epicentral distance: 42.418022 degrees Station # 158: US.COWI epicentral distance: 42.481190 degrees Station # 176: US.KSU1 epicentral distance: 42.854454 degrees Station # 196: US.SCIA epicentral distance: 42.898548 degrees Station # 149: US.AMTX epicentral distance: 42.921909 degrees Station # 271: N4.MSTX epicentral distance: 42.965900 degrees Station # 238: N4.I40B epicentral distance: 43.061317 degrees Station # 185: US.MNTX epicentral distance: 43.180031 degrees Station # 230: N4.F42A epicentral distance: 43.240807 degrees Station # 273: N4.N38B epicentral distance: 43.625595 degrees Station # 175: US.JFWS epicentral distance: 43.899185 degrees Station # 258: N4.L40A epicentral distance: 43.914387 degrees Station # 239: N4.I42A epicentral distance: 44.004452 degrees Station # 286: N4.P38B epicentral distance: 44.244911 degrees Station # 235: N4.H43A epicentral distance: 44.259701 degrees Station # 315: N4.T35B epicentral distance: 44.421242 degrees Station # 201: US.WMOK epicentral distance: 44.677055 degrees Station # 259: N4.L42A epicentral distance: 44.821751 degrees Station # 227: N4.E46A epicentral distance: 44.923138 degrees Station # 274: N4.N41A epicentral distance: 45.022800 degrees Station # 251: N4.K43A epicentral distance: 45.100460 degrees Station # 287: N4.P40B epicentral distance: 45.148758 degrees Station # 94: IU.WAKE epicentral distance: 45.152176 degrees Station # 240: N4.I45A epicentral distance: 45.381012 degrees Station # 323: N4.TUL3 epicentral distance: 45.541893 degrees Station # 306: N4.S39B epicentral distance: 45.694679 degrees Station # 222: N4.ABTX epicentral distance: 45.728127 degrees Station # 167: US.GLMI epicentral distance: 45.746983 degrees Station # 45: IU.KBS epicentral distance: 45.761795 degrees Station # 300: N4.R40B epicentral distance: 45.875923 degrees Station # 324: N4.U38B epicentral distance: 45.968262 degrees Station # 170: US.HDIL epicentral distance: 46.000931 degrees Station # 15: IC.MDJ epicentral distance: 46.089497 degrees Station # 265: N4.M44A epicentral distance: 46.137505 degrees Station # 80: IU.SFJD epicentral distance: 46.285595 degrees Station # 288: N4.P43A epicentral distance: 46.496716 degrees Station # 347: N4.Z35B epicentral distance: 46.568455 degrees Station # 27: IU.CCM epicentral distance: 46.619053 degrees Station # 244: N4.J47A epicentral distance: 46.678482 degrees Station # 260: N4.L46A epicentral distance: 46.686901 degrees Station # 58: IU.MAJO epicentral distance: 46.770386 degrees Station # 82: IU.SLBS epicentral distance: 46.989681 degrees Station # 174: US.JCT epicentral distance: 47.236687 degrees Station # 294: N4.Q44B epicentral distance: 47.288155 degrees Station # 241: N4.I49A epicentral distance: 47.297077 degrees Station # 312: N4.SFIN epicentral distance: 47.348766 degrees Station # 316: N4.T42B epicentral distance: 47.423203 degrees Station # 337: N4.WHTX epicentral distance: 47.438595 degrees Station # 184: US.MIAR epicentral distance: 47.782181 degrees Station # 275: N4.N47A epicentral distance: 47.785889 degrees Station # 289: N4.P46A epicentral distance: 47.809814 degrees Station # 348: N4.Z38B epicentral distance: 47.912746 degrees Station # 145: US.AAM epicentral distance: 47.942612 degrees Station # 307: N4.S44A epicentral distance: 47.996696 degrees Station # 252: N4.K50A epicentral distance: 48.186100 degrees Station # 12: IC.HIA epicentral distance: 48.186790 degrees Station # 282: N4.O48B epicentral distance: 48.511524 degrees Station # 276: N4.N49A epicentral distance: 48.592594 degrees Station # 317: N4.T45B epicentral distance: 48.818001 degrees Station # 266: N4.M50A epicentral distance: 48.889153 degrees Station # 290: N4.P48A epicentral distance: 48.911953 degrees Station # 283: N4.O49A epicentral distance: 49.013348 degrees Station # 188: US.NATX epicentral distance: 49.201656 degrees Station # 95: IU.WCI epicentral distance: 49.286777 degrees Station # 277: N4.N51A epicentral distance: 49.588684 degrees Station # 221: N4.735B epicentral distance: 49.594658 degrees Station # 318: N4.T47A epicentral distance: 49.706474 degrees Station # 267: N4.M52A epicentral distance: 49.715790 degrees Station # 146: US.ACSO epicentral distance: 49.730022 degrees Station # 301: N4.R49A epicentral distance: 49.870598 degrees Station # 39: IU.HKT epicentral distance: 49.872437 degrees Station # 96: IU.WVT epicentral distance: 49.908997 degrees Station # 165: US.ERPA epicentral distance: 50.060329 degrees Station # 193: US.OXF epicentral distance: 50.178013 degrees Station # 302: N4.R50A epicentral distance: 50.348492 degrees Station # 207: N4.143B epicentral distance: 50.362427 degrees Station # 295: N4.Q51A epicentral distance: 50.370556 degrees Station # 245: N4.J55A epicentral distance: 50.435070 degrees Station # 284: N4.O52A epicentral distance: 50.440907 degrees Station # 278: N4.N53A epicentral distance: 50.506321 degrees Station # 177: US.KVTX epicentral distance: 50.548698 degrees Station # 341: N4.Y45B epicentral distance: 50.571201 degrees Station # 328: N4.V48A epicentral distance: 50.757614 degrees Station # 325: N4.U49A epicentral distance: 50.790997 degrees Station # 215: N4.441B epicentral distance: 50.791645 degrees Station # 319: N4.T50A epicentral distance: 50.957043 degrees Station # 296: N4.Q52A epicentral distance: 51.012302 degrees Station # 291: N4.P53A epicentral distance: 51.126614 degrees Station # 285: N4.O54A epicentral distance: 51.193005 degrees Station # 181: US.LONY epicentral distance: 51.202148 degrees Station # 308: N4.S51A epicentral distance: 51.212288 degrees Station # 200: US.VBMS epicentral distance: 51.224247 degrees Station # 246: N4.J57A epicentral distance: 51.298031 degrees Station # 261: N4.L56A epicentral distance: 51.352261 degrees Station # 253: N4.K57A epicentral distance: 51.493896 degrees Station # 338: N4.X48A epicentral distance: 51.562305 degrees Station # 208: N4.146B epicentral distance: 51.747425 degrees Station # 297: N4.Q54A epicentral distance: 51.787106 degrees Station # 183: US.MCWV epicentral distance: 51.852131 degrees Station # 349: N4.Z47B epicentral distance: 51.895912 degrees Station # 333: N4.W50A epicentral distance: 51.998638 degrees Station # 199: US.TZTN epicentral distance: 52.019722 degrees Station # 247: N4.J59A epicentral distance: 52.042664 degrees Station # 268: N4.M57A epicentral distance: 52.141338 degrees Station # 150: US.BINY epicentral distance: 52.145870 degrees Station # 84: IU.SSPA epicentral distance: 52.226120 degrees Station # 224: N4.D62A epicentral distance: 52.298801 degrees Station # 342: N4.Y49A epicentral distance: 52.353786 degrees Station # 375: NE.VT1 epicentral distance: 52.355080 degrees Station # 309: N4.S54A epicentral distance: 52.365425 degrees Station # 228: N4.E62A epicentral distance: 52.386093 degrees Station # 213: N4.346B epicentral distance: 52.439674 degrees Station # 182: US.LRAL epicentral distance: 52.621212 degrees Station # 42: IU.INCN epicentral distance: 52.621807 degrees Station # 298: N4.Q56A epicentral distance: 52.649811 degrees Station # 262: N4.L59A epicentral distance: 52.651623 degrees Station # 303: N4.R55A epicentral distance: 52.678341 degrees Station # 279: N4.N58A epicentral distance: 52.712772 degrees Station # 339: N4.X51A epicentral distance: 52.715542 degrees Station # 179: US.LBNH epicentral distance: 52.835220 degrees Station # 334: N4.W52A epicentral distance: 52.855061 degrees Station # 233: N4.G62A epicentral distance: 52.858883 degrees Station # 218: N4.545B epicentral distance: 52.859642 degrees Station # 292: N4.P57A epicentral distance: 52.974285 degrees Station # 374: NE.TRY epicentral distance: 52.994068 degrees Station # 236: N4.H62A epicentral distance: 52.994175 degrees Station # 326: N4.U54A epicentral distance: 53.000866 degrees Station # 364: NE.HNH epicentral distance: 53.025387 degrees Station # 329: N4.V53A epicentral distance: 53.062744 degrees Station # 372: NE.PQI epicentral distance: 53.092178 degrees Station # 248: N4.J61A epicentral distance: 53.139992 degrees Station # 97: IU.XMAS epicentral distance: 53.209785 degrees Station # 151: US.BLA epicentral distance: 53.280235 degrees Station # 242: N4.I62A epicentral distance: 53.394806 degrees Station # 194: US.PKME epicentral distance: 53.450283 degrees Station # 231: N4.F64A epicentral distance: 53.497402 degrees Station # 370: NE.NHFNK epicentral distance: 53.499088 degrees Station # 243: N4.I63A epicentral distance: 53.656094 degrees Station # 343: N4.Y52A epicentral distance: 53.676704 degrees Station # 310: N4.S57A epicentral distance: 53.693417 degrees Station # 263: N4.L61B epicentral distance: 53.707870 degrees Station # 123: II.KWJN epicentral distance: 53.708191 degrees Station # 378: NE.WVL epicentral distance: 53.758553 degrees Station # 254: N4.K62A epicentral distance: 53.786964 degrees Station # 211: N4.250A epicentral distance: 53.820126 degrees Station # 330: N4.V55A epicentral distance: 53.822395 degrees Station # 327: N4.U56A epicentral distance: 53.925365 degrees Station # 154: US.BRAL epicentral distance: 53.977077 degrees Station # 373: NE.QUA2 epicentral distance: 54.000267 degrees Station # 371: NE.ORNO epicentral distance: 54.020081 degrees Station # 140: II.TLY epicentral distance: 54.044498 degrees Station # 320: N4.T57A epicentral distance: 54.078285 degrees Station # 362: NE.DUNH epicentral distance: 54.123108 degrees Station # 304: N4.R58B epicentral distance: 54.136063 degrees Station # 209: N4.152A epicentral distance: 54.186455 degrees Station # 157: US.CBN epicentral distance: 54.238213 degrees Station # 41: IU.HRV epicentral distance: 54.250591 degrees Station # 255: N4.KMSC epicentral distance: 54.273335 degrees Station # 377: NE.WSPT epicentral distance: 54.279114 degrees Station # 365: NE.MAACT epicentral distance: 54.305531 degrees Station # 168: US.GOGA epicentral distance: 54.345608 degrees Station # 234: N4.G65A epicentral distance: 54.357368 degrees Station # 280: N4.N62A epicentral distance: 54.376938 degrees Station # 376: NE.WES epicentral distance: 54.459896 degrees Station # 369: NE.MATOP epicentral distance: 54.474796 degrees Station # 368: NE.MANTK epicentral distance: 54.506935 degrees Station # 361: NE.BCX epicentral distance: 54.573841 degrees Station # 360: NE.BCDNQ epicentral distance: 54.573841 degrees Station # 293: N4.P61A epicentral distance: 54.575333 degrees Station # 367: NE.MAGLO epicentral distance: 54.612709 degrees Station # 366: NE.MAFXB epicentral distance: 54.719723 degrees Station # 363: NE.EMMW epicentral distance: 54.761940 degrees Station # 269: N4.M63A epicentral distance: 54.785454 degrees Station # 214: N4.352A epicentral distance: 54.951435 degrees Station # 335: N4.W57A epicentral distance: 55.012547 degrees Station # 331: N4.V58A epicentral distance: 55.030720 degrees Station # 264: N4.L64A epicentral distance: 55.031734 degrees Station # 321: N4.T59A epicentral distance: 55.032310 degrees Station # 46: IU.KEV epicentral distance: 55.087784 degrees Station # 216: N4.451A epicentral distance: 55.131645 degrees Station # 210: N4.154A epicentral distance: 55.138588 degrees Station # 305: N4.R61A epicentral distance: 55.251312 degrees Station # 93: IU.ULN epicentral distance: 55.266289 degrees Station # 107: II.BORG epicentral distance: 55.511566 degrees Station # 311: N4.S61A epicentral distance: 55.542931 degrees Station # 344: N4.Y57A epicentral distance: 55.625370 degrees Station # 322: N4.TIGA epicentral distance: 55.739445 degrees Station # 340: N4.X58A epicentral distance: 55.815834 degrees Station # 336: N4.W59A epicentral distance: 55.862198 degrees Station # 345: N4.Y58A epicentral distance: 56.112461 degrees Station # 219: N4.553A epicentral distance: 56.199299 degrees Station # 190: US.NHSC epicentral distance: 56.418922 degrees Station # 332: N4.V61A epicentral distance: 56.431896 degrees Station # 10: IC.BJT epicentral distance: 56.437138 degrees Station # 124: II.LVZ epicentral distance: 56.673843 degrees Station # 212: N4.257A epicentral distance: 56.786213 degrees Station # 346: N4.Y60A epicentral distance: 56.846622 degrees Station # 217: N4.456A epicentral distance: 57.161289 degrees Station # 220: N4.656A epicentral distance: 57.894760 degrees Station # 85: IU.TARA epicentral distance: 59.062077 degrees Station # 50: IU.KNTN epicentral distance: 59.197605 degrees Station # 33: IU.DWPF epicentral distance: 59.477257 degrees Station # 17: IC.SSE epicentral distance: 60.356575 degrees Station # 38: IU.GUMO epicentral distance: 60.847641 degrees Station # 205: N4.060A epicentral distance: 60.907791 degrees Station # 206: N4.061Z epicentral distance: 61.500675 degrees Station # 87: IU.TEIG epicentral distance: 61.715145 degrees Station # 122: II.KURK epicentral distance: 64.456177 degrees Station # 19: IC.XAN epicentral distance: 64.744698 degrees Station # 51: IU.KONO epicentral distance: 64.802391 degrees Station # 103: II.ARTI epicentral distance: 64.829056 degrees Station # 86: IU.TATO epicentral distance: 64.895447 degrees Station # 108: II.BRVK epicentral distance: 64.975082 degrees Station # 104: II.ARU epicentral distance: 65.179810 degrees Station # 24: IU.BBSR epicentral distance: 65.437599 degrees Station # 59: IU.MAKZ epicentral distance: 66.600517 degrees Station # 34: IU.FUNA epicentral distance: 66.659393 degrees Station # 18: IC.WMQ epicentral distance: 66.887558 degrees Station # 9: CU.TGUH epicentral distance: 67.225983 degrees Station # 11: IC.ENH epicentral distance: 67.372818 degrees Station # 114: II.ESK epicentral distance: 67.609833 degrees Station # 130: II.OBN epicentral distance: 69.238815 degrees Station # 7: CU.MTDJ epicentral distance: 69.420685 degrees Station # 6: CU.GTBY epicentral distance: 69.447495 degrees Station # 21: IU.AFI epicentral distance: 70.132820 degrees Station # 5: CU.GRTK epicentral distance: 70.438568 degrees Station # 118: II.JTS epicentral distance: 71.592377 degrees Station # 8: CU.SDDR epicentral distance: 72.364037 degrees Station # 100: II.AAK epicentral distance: 72.879036 degrees Station # 40: IU.HNR epicentral distance: 73.780113 degrees Station # 47: IU.KIEV epicentral distance: 74.105019 degrees Station # 37: IU.GRFO epicentral distance: 74.819443 degrees Station # 13: IC.KMI epicentral distance: 75.067055 degrees Station # 3: CU.BCIP epicentral distance: 75.401962 degrees Station # 127: II.MSVF epicentral distance: 75.815979 degrees Station # 117: II.IBFO epicentral distance: 75.917427 degrees Station # 143: II.XBFO epicentral distance: 75.919067 degrees Station # 106: II.BFO epicentral distance: 75.919067 degrees Station # 81: IU.SJG epicentral distance: 75.950951 degrees Station # 16: IC.QIZ epicentral distance: 76.137657 degrees Station # 74: IU.RAR epicentral distance: 76.353302 degrees Station # 32: IU.DAV epicentral distance: 76.893105 degrees Station # 14: IC.LSA epicentral distance: 77.125587 degrees Station # 65: IU.PAYG epicentral distance: 77.711273 degrees Station # 137: II.SIMI epicentral distance: 78.293884 degrees Station # 109: II.CMLA epicentral distance: 78.469727 degrees Station # 1: CU.ANWB epicentral distance: 78.730888 degrees Station # 67: IU.PMG epicentral distance: 79.333153 degrees Station # 121: II.KIV epicentral distance: 79.550690 degrees Station # 79: IU.SDV epicentral distance: 80.873322 degrees Station # 128: II.NIL epicentral distance: 81.302849 degrees Station # 44: IU.KBL epicentral distance: 82.014137 degrees Station # 28: IU.CHTO epicentral distance: 82.233315 degrees Station # 64: IU.PAB epicentral distance: 82.628075 degrees Station # 36: IU.GNI epicentral distance: 82.955910 degrees Station # 4: CU.GRGR epicentral distance: 83.308647 degrees Station # 70: IU.PTCN epicentral distance: 83.481422 degrees Station # 63: IU.OTAV epicentral distance: 83.485992 degrees Station # 2: CU.BBGH epicentral distance: 83.648315 degrees Station # 23: IU.ANTO epicentral distance: 84.687180 degrees Station # 73: IU.RAO epicentral distance: 86.156281 degrees Station # 31: IU.CTAO epicentral distance: 89.094749 degrees Station # 119: II.KAPI epicentral distance: 90.080643 degrees Station # 57: IU.MACI epicentral distance: 90.171333 degrees Station # 134: II.RPN epicentral distance: 91.863922 degrees Station # 129: II.NNA epicentral distance: 94.431213 degrees Station # 142: II.WRAB epicentral distance: 94.605293 degrees Station # 71: IU.PTGA epicentral distance: 94.782471 degrees Station # 141: II.UOSS epicentral distance: 94.983322 degrees Station # 133: II.RAYN epicentral distance: 98.991165 degrees Station # 83: IU.SNZO epicentral distance: 99.262672 degrees Station # 77: IU.SAML epicentral distance: 99.648308 degrees Station # 135: II.SACV epicentral distance: 99.780228 degrees Station # 131: II.PALK epicentral distance: 101.315697 degrees Station # 60: IU.MBWA epicentral distance: 103.267693 degrees Station # 52: IU.KOWA epicentral distance: 106.796890 degrees Station # 55: IU.LVC epicentral distance: 107.425476 degrees Station # 139: II.TAU epicentral distance: 108.617744 degrees Station # 110: II.COCO epicentral distance: 108.939735 degrees Station # 53: IU.LCO epicentral distance: 111.527992 degrees Station # 75: IU.RCBR epicentral distance: 112.243813 degrees Station # 62: IU.NWAO epicentral distance: 113.976898 degrees Station # 35: IU.FURI epicentral distance: 114.539948 degrees Station # 111: II.DGAR epicentral distance: 118.063927 degrees Station # 126: II.MSEY epicentral distance: 122.967308 degrees Station # 90: IU.TRQA epicentral distance: 123.109291 degrees Station # 105: II.ASCN epicentral distance: 124.293205 degrees Station # 49: IU.KMBO epicentral distance: 124.592354 degrees Station # 125: II.MBAR epicentral distance: 124.962517 degrees Station # 112: II.EFI epicentral distance: 134.423752 degrees Station # 78: IU.SBA epicentral distance: 134.838379 degrees Station # 136: II.SHEL epicentral distance: 134.975815 degrees Station # 68: IU.PMSA epicentral distance: 139.134171 degrees Station # 101: II.ABPO epicentral distance: 139.178696 degrees Station # 26: IU.CASY epicentral distance: 139.460815 degrees Station # 54: IU.LSZ epicentral distance: 139.714325 degrees Station # 91: IU.TSUM epicentral distance: 143.636383 degrees Station # 72: IU.QSPA epicentral distance: 145.203156 degrees Station # 116: II.HOPE epicentral distance: 146.808273 degrees Station # 89: IU.TRIS epicentral distance: 150.073502 degrees Station # 138: II.SUR epicentral distance: 156.928391 degrees
maximum error in location of all the receivers: 5.45888942E-12 km
Elapsed time for receiver detection in seconds = 0.24783961405046284
End of receiver detection - done
found a total of 378 receivers in all slices this total is okay
source arrays: number of sources is 1 size of source array = 1.43051147E-03 MB = 1.39698386E-06 GB
seismograms: seismograms written by all processes writing out seismograms at every NTSTEP_BETWEEN_OUTPUT_SEISMOS = 135500 maximum number of local receivers is 147 in slice 5 size of maximum seismogram array = 227.949142 MB = 0.222606584 GB
Total number of samples for seismograms = 135500
Reference radius of the globe used is 6371.0000000000000 km
incorporating the oceans using equivalent load
incorporating ellipticity
incorporating surface topography
incorporating self-gravitation (Cowling approximation)
incorporating rotation
incorporating attenuation using 3 standard linear solids
preparing mass matrices preparing constants preparing gravity arrays preparing attenuation The code uses a constant Q quality factor, but approximated based on a series of Zener standard linear solids (SLS). Approximation is performed in the following frequency band:
number of SLS bodies: 3 partial attenuation, physical dispersion only: F
Reference frequency of anelastic model (Hz): 1.00000000
period (s): 1.00000000
Attenuation frequency band min/max (Hz): 1.02351687E-03 / 5.75565845E-02
period band min/max (s) : 17.3742065 / 977.023499
Logarithmic center frequency (Hz): 7.67529383E-03
period (s): 130.288177
using shear attenuation Q_mu
ATTENUATION_1D_WITH_3D_STORAGE : T ATTENUATION_3D : F preparing elastic element arrays using attenuation: shifting to unrelaxed moduli crust/mantle transverse isotropic and isotropic elements tiso elements = 98304 iso elements = 69632 inner core isotropic elements iso elements = 4864 preparing wavefields allocating wavefields initializing wavefields
This is where the file ends, because that's where the program stops running.
right, the node or workstation has not enough memory to fit and run this simulation setup.
the code stops when assigning values to the wavefields. after allocation, arrays have not been mapped to memory yet. this is done with the first wavefield initialization here. given with this setup of 24 MPI processes on a single node and the NEX 256 setting, the estimate is having ~160GB memory.
you will have to run on multiple nodes or workstations (given they can communicate by an MPI installation), or run it on a fat node with more memory.
regarding the GPUs, a Geforce RTX 2080 card has 8GB memory. the setup with NEX 256 and 24 MPI processes (and model s362ani) will require ~5GB GPU memory per process based on my past experience. thus, only a single process would fit onto one card. I'm afraid you will need more GPU cards as well to run this setup.
While trying to run the
go_solver_pbs.bash
in theglobal_s362ani_shakemovie
directory, I get ampirun noticed that process rank 7 with PID 0 on node [nodename] exited on signal 9 (killed)
error, which I suppose is a memory overflow error. How do I fix this?