ACCORD-NWP / DAVAI-tests

DAVAÏ tests templates and config files
Other
0 stars 9 forks source link

Experts for ModelTLAD and EnsembleRead miss stdeo files at ECMWF #10

Closed AlexandreMary closed 4 days ago

AlexandreMary commented 10 months ago

On ECMWF Atos machines, these tests don't produce stdeo files, hence the associated experts can't provide a status and comparison to reference.

The info actually appears in the job output, e.g.

<Message file=".D[78]/testsuite/TestSuiteModel.h" line="55"><![CDATA[<dx1,Mtdx2> = -81378.700057880603708 <Mdx1,dx2> = -81378.700057875947095 digits = 13.242440613742617828]]></Message>

together with an error message

# ----  rank 6: stdout/err  ---- #
--------------------------------------------------------------------------
WARNING: Could not generate an xpmem segment id for this process'
address space.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: ac3-1072
  Error code: 2 (No such file or directory)
--------------------------------------------------------------------------
AlexandreMary commented 9 months ago

The issue is actually that stdout and stderr are not redirected to files. There is an option in srun to do so (--output), or maybe also to let vortex do it (mpiwrapstd) ?

AlexandreMary commented 9 months ago

Workaround: add --output=stdeo.%%t to mpiopts variable of the [srun] section in vortex conf/target-commons.ini file.

AlexandreMary commented 9 months ago

OK, the above solution is merged in Vortex olive-dev

romick-knmi commented 6 months ago

Tested change by merging your ATOS olive-dev branch into mine : Crashes on launch of mpi. See /scratch/nlcr/mtool/depot/mstep_001143_ModelTLarpege on ATOS.

AlexandreMary commented 6 months ago

Yes indeed, I ran into the same issue recently. Means there is an issue with vortex olive-dev latest version. I filed the issue to the vortex team, meanwhile I reopen this issue until it's solved.

AlexandreMary commented 1 week ago

For Davaï-WW n°3: check on deployment of the latest olive-dev at ECMWF, and if OK, close this issue

tlestang commented 5 days ago

Just deployed the vortex developement version at ECMWF (https://git.meteo.fr/cnrm-gmap/vortex/-/commit/d5418a28bbf29cf0902918d387ab5db1f0d6676a)

AlexandreMary commented 4 days ago

Solved