GambitBSM / gambit_2.4

3 stars 0 forks source link

Issues in running gambit_2.4 #9

Open aseshkdatta opened 3 months ago

aseshkdatta commented 3 months ago

Hi,

I'm having the following problems in running the some of the Gambit example files as described in arXiv:1705.07908 (the Gambit manual) and arXiv:2107.00030 (the GUM manual).

Would you kindly help me resolve these issues.

Thanks. Asesh K Datta

#########################################################################

(I built gambit with "cmake -DWITH_AXEL=ON -DWITH_HEPMC=ON -DWITH_YODA=ON -DWITH_MPI=ON -Ditch=pybind11 -DBUILD_FS_MODELS=MSSM .. ")


With MSSM7 scan

time mpirun -np 16 ./gambit -f yaml_files/MSSM7.yaml


Issue 1


Aborting after running for around half an hour prompting the following.

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 71 CMS_ttH-3leptons_8TeV_19.6fb-1_125.6GeV_1682105.txt
72 CMS_ttH-4leptons_8TeV_19.6fb-1_125.6GeV_1682106.txt
73 C
At line 106 of file datatables.f90 (unit = 133, file = '/home/asesh/Packages/Gambit-BSM/gambit_2.4/Backends/installed/higgssignals/1.4.0/Expt_tables/latestresults/C') Fortran runtime error: End of file

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[63137,1],13] Exit code: 2


Issue 2


Also cannot restart the scan successfully by including the "-r" option in the execution command, i.e., time mpirun -np 16 ./gambit -r -f yaml_files/MSSM7.yaml

(Only removing the folder "MSSM7" under the "gambit_2.4/runs" folder helps starting a fresh scan.)

The following is the error message.

Starting GAMBIT

Running in MPI-parallel mode with 16 processes

Running with 16 OpenMP threads per MPI process (set by the environment variable OMP_NUM_THREADS). YAML file: yaml_files/MSSM7.yaml Importing: include/StandardModel_SLHA2_scan.yaml Initialising logger... log_debug_messages = true; log messages tagged as 'Debug' WILL be logged. WARNING: This may lead to very large log files! Group readable: runs/MSSM7//samples//MSSM7.hdf5 , /MSSM7 : 1

FATAL ERROR

GAMBIT has exited with fatal exception: GAMBIT error ERROR: A problem has occurred in the printer utilities. Error preparing pre-existing output file 'runs/MSSM7//samples//MSSM7.hdf5' for writing via hdf5printer! The requested output group '/MSSM7 already exists in this file! Please take one of the following actions:

  1. Choose a new group via the 'group' option in the Printer section of your input YAML file;
  2. Delete the existing group from 'runs/MSSM7//samples//MSSM7.hdf5';
  3. Delete the existing output file, or set 'delete_file_on_restart: true' in your input YAML file to give GAMBIT permission to automatically delete it (applies when -r/--restart flag used);

Note: This error most commonly occurs when you try to resume a scan that has already finished!

Raised at: line 1524 in function Gambit::Printers::HDF5Printer2::HDF5Printer2(const Gambit::Options&, Gambit::Printers::BasePrinter*) of /home/asesh/Packages/Gambit-BSM/gambit_2.4/Printers/src/printers/hdf5printer_v2/hdf5printer_v2.cpp. rank 0: FinalizeWithTimeout failed to sync for clean MPI shutdown, calling MPI_Abort... rank 0: Issuing MPI_Abort command, attempting to terminate all processes...

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

=============================== With GUM implemented model MDMSM


Issue 1


  1. For smaller values of NP (=1000) for a Diver scan (as set in "yaml_files/MDMSM_Tute.yaml"), the scan exits smoothly creating an "MDMSM.hdf5" file in the "runs/MDMSM/samples" folder.

    However, executing "pippi MDMSM.pip" under "gum/Tutorial" is returning the following error messages for two different installations of pippi.


Case A

When using "pippi" from the "pippi" folder created under gambit root folder by "make get-pippi":

File "/home/asesh/Packages/Gambit-BSM/gambit_2.4/gum/Tutorial/../../pippi/pippi", line 41 print 'Beginning pippi '+arguments[1]+' operation...' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?


Case B

When using "pippi" from a direct git-cloned folder:

  "Beginning pippi parse-to-plot operation...
   Running pippi failed in parse operation.
  Error: field specific_bins required for requested operation not found in MDMSM.pip.  
  Quitting...
  --------------------------------------------------------

Issue 2


When I increase the value of NP (say, 5000 or 10000), the Diver scan crashes routinely by prompting the following (after running for more than 20 minutes). ::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: theta13: 0.15495 theta23: 0.76958 nuclear_params_sigmas_sigmal: deltad: -0.427 deltas: -0.085 deltau: 0.842 sigmal: 58 sigmas: 43

Raised at: line 329 in function void Gambit::Printers::HDF5DataSet::write_buffer(const T (&)[100000], std::size_t, std::size_t, bool) [with T = double; std::size_t = long unsigned int] of /home/asesh/Packages/Gambit-BSM/gambit_2.4/Printers/include/gambit/Printers/printers/hdf5printer_v2.hpp.

mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[54161,1],13] Exit code: 1

tegonzalo commented 3 months ago

Hi Asesh, sorry for the late response.

Issue 1 seems to be a problem with Higgssignals that we have not encountered before. Do you have, by any chance, the details of the parameter point for which it failed?

Issue 2 is indeed a problem with the restart feature. I think you can add the option delete_file_on_restart: true, to the printer options, and it will work. But indeed deleting the samples folder is the fastest way to solve it. This is a known issue and someone is working to fix in the development repo.

Problems with pippi:

Case A: The main pippi repo was written for python 2, and was only updated to python 3 a couple of years ago, but no new versions were tagged with that change. I'll ask one of the developers to do that. For now cloning the top of the master branch is probably the best option.

Case B: It seems that the pippi file that gum produces is outdated and does not contain some fields that are needed in newer pippi versions. Inside the Parsing block, the following fields are needed, replacing any others that have to do with binning or interpolation

default_bins = 50                           ;Default number of bins to sort samples into in each direction.
specific_bins =                             ;Bins to use for specific observables (EFN; overrides default)
default_resolution = 300                    ;Default resolution of binwise interpolation for plotting (~300+ for publication)
specific_resolution =                       ;Resolution of binwise interpolation for specific observables (EFN; overrides default)
interpolation_method = 'spline'             ;Either bilinear (default) or spline (watch out for ringing in the latter case)

Last issue 2: the hdf5 v2 printer behaves sometimes strangely in some systems. And it seems that yours is one of them. If that is the case, change the printer in the yaml file to hdf5_v1. This uses the old version of the hdf5 printer, which prints to a different hdf5 file per MPI process and combines them at the end. It is slightly less efficient, but it does not have the same problem as the hdf5 v2 printer.