FESOM / fesom2

Multi-resolution ocean general circulation model.
http://fesom.de/
GNU General Public License v3.0
47 stars 48 forks source link

Error: size = 0, at file lib/parms/src/parms_map.c #539

Closed jokervTv closed 3 weeks ago

jokervTv commented 10 months ago

the log:

============================================
 FESOM start iteration before the barrier...
 FESOM start iteration after the barrier...

 ^[[32m____________________________________________________________^[[0m
 ^[[7;32m --> FESOM STARTS TIME LOOP                                 ^[[0m
 Updating SSS restoring data for month            1
Error: size = 0 at line 682 in function parms_MapCreateFromPetsc at file /public/home/user/fesom2/   lib/parms/src/parms_map.c

I'm confused about this。

Thanks for your reading.

JanStreffing commented 10 months ago

Hello Yongpeng, can you provide a bit more background? What are you trying to do, which mesh, which machine are you using? Can you upload your namelists and the full logfile?

You get stuck somewhere between setting around the start of the time loop but before the IO is set up. That sounds a bit like an IO server problem.

Best, Jan

patrickscholz commented 10 months ago

@JanStreffing and @jokervTv , PARMS is in our case the solver for the SSH (sea surface height), there might have went some wrong with the compilation on your maschine. Therefor its is important to know on which maschine you are working. The actual refactoring branch also has a buildin Solver, where we are not relying anymore on parms and petsc. There is in moment only a hard coded flag use_parms=.true./.false. to switch for this option, its in src/MOD_DYN.F90

'''

! !___ TYPE T_SOLVERINFO integer :: ident = 1 integer :: maxiter = 2000 integer :: restart = 15 integer :: fillin = 3 integer :: lutype = 2 real(kind=WP) :: droptol = 1.e-8 !!! PARMS Solver real(kind=WP) :: soltol = 1e-10 ! default for PARMS logical :: use_parms = .TRUE. !!! !!! Sergey's Solver ! real(kind=WP) :: soltol = 1e-5 ! default for PARMS ! logical :: use_parms = .FALSE. !!! real(kind=WP), allocatable :: rr(:), zz(:), pp(:), App(:) contains procedure WRITE_T_SOLVERINFO procedure READ_T_SOLVERINF ''' ... you could make a test with use_parms=.False. to see if it overcomes your issue. But in this case dont forget to re-compile!!!

jokervTv commented 9 months ago

Thank @JanStreffing and @patrickscholz for the reply and help.

Thanks @patrickscholz 's suggestion. Change use_parms is work for me. Thanks a lot.

The machine info is fallow: - CentOS Linux release 7.6.1810 (Core) - kernel: 3.10.0-957.el7.x86_64 SMP Thu Nov 8 23:39:32 UTC 2018 - platform: **x86_64** - compiler: GCC 11.4.0 and Intel 2021.3.0 (classic) - openmpi 4.1.6 and 5.0.0
CPU info (Calculate node) ```log Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7742 64-Core Processor Stepping: 0 CPU MHz: 2250.000 CPU max MHz: 2250.0000 CPU min MHz: 1500.0000 BogoMIPS: 4500.04 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl xtopology nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca ```

and I wanna to run fesom2 from 2019 to 2119, the namelist is fallow:

namelist.config ```log ! This is the namelist file for model general configuration &modelname runid='fesom' / ×tep step_per_day=32 !96 !96 !72 !72 !45 !72 !96 run_length=62 !62 !62 !62 !28 run_length_unit='y' ! y, m, d, s / &clockinit ! the model starts at timenew=0.0 daynew=1 yearnew=2019 / &paths MeshPath='/public/home/yp/Data/FESOM2_input/mesh/pi/' ClimateDataPath='/public/home/yp/Data/FESOM2_input/input/phc3.0/' ResultPath='./fesom_result/' / &restart_log restart_length=1 ! --> do netcdf restart ( only required for d,h,s cases, y, m take 1) restart_length_unit='y' !output period: y, d, h, s, off raw_restart_length=0 ! --> do core dump restart raw_restart_length_unit='y' ! e.g. y, d, h, s, off bin_restart_length=0 ! --> do derived type binary restart bin_restart_length_unit='y' ! e.g. y, d, h, s, off logfile_outfreq=96 !in logfile info. output frequency, # steps / &ale_def which_ALE='zstar' ! 'linfs','zlevel', 'zstar' use_partial_cell=.true. / &geometry cartesian=.false. fplane=.false. cyclic_length=360 ![degree] rotated_grid=.true. !option only valid for coupled model case now force_rotation=.true. alphaEuler=50. ![degree] Euler angles, convention: betaEuler=15. ![degree] first around z, then around new x, gammaEuler=-90. ![degree] then around new z. / &calendar include_fleapyear=.true. / &run_config use_ice=.true. ! ocean+ice use_cavity=.false. ! use_cavity_partial_cell=.false. use_floatice = .false. use_sw_pene=.true. flag_debug=.false. / &machine n_levels=2 n_part= 12, 36 ! 432 number of partitions on each hierarchy level / ```
namelist.io ```log &diag_list ldiag_solver =.false. lcurt_stress_surf=.false. ldiag_curl_vel3 =.false. ldiag_Ri =.false. ldiag_turbflux =.false. ldiag_salt3D =.false. ldiag_dMOC =.false. ldiag_DVD =.false. ldiag_forc =.false. ldiag_extflds =.false. / &nml_general io_listsize =100 !number of streams to allocate. shallbe large or equal to the number of streams in &nml_list vec_autorotate =.false. / ! for sea ice related variables use_ice should be true, otherewise there will be no output ! for 'curl_surf' to work lcurt_stress_surf must be .true. otherwise no output ! for 'fer_C', 'bolus_u', 'bolus_v', 'bolus_w', 'fer_K' to work Fer_GM must be .true. otherwise no output ! 'otracers' - all other tracers if applicable ! for 'dMOC' to work ldiag_dMOC must be .true. otherwise no output &nml_list io_list = 'sst ',1, 'm', 4, 'sss ',1, 'm', 4, 'ssh ',1, 'm', 4, 'uice ',1, 'd', 4, 'vice ',1, 'd', 4, 'a_ice ',1, 'm', 4, 'm_ice ',1, 'm', 4, 'm_snow ',1, 'm', 4, 'MLD1 ',1, 'm', 4, 'MLD2 ',1, 'm', 4, 'MLD3 ',1, 'm', 4, 'tx_sur ',1, 'm', 4, 'ty_sur ',1, 'm', 4, 'temp ',1, 'y', 4, 'salt ',1, 'y', 8, 'N2 ',1, 'y', 4, 'Kv ',1, 'y', 4, 'u ',1, 'y', 4, 'v ',1, 'y', 4, 'unod ',1, 'y', 4, 'vnod ',1, 'y', 4, 'w ',1, 'y', 4, 'Av ',1, 'y', 4, 'bolus_u ',1, 'y', 4, 'bolus_v ',1, 'y', 4, 'bolus_w ',1, 'y', 4, / ```

And @JanStreffing sorry about that, I have tried many times(both gcc and intel work well) but can't reproduce the error again, so I can't provide more information. At the moment I suspect it is an asynchronous IO problem, but I can't be sure.

Thanks again.

Best wishes

JanStreffing commented 3 weeks ago

We recently removed parms (https://github.com/FESOM/fesom2/pull/597) in favor of Sergeys solver. Closing here. Please open new issue in case of other problems.