Closed CFGrote closed 3 years ago
Branch checking list
branch from Jun
branch from Carsten
others
First day summary
install.sh:
Have a more comprehensive README.md:
git clone --depth 1 -b develop git@github.com:PaNOSC-ViNYL/SimEx.git
Default modules to enable:
Documentation:
Testing cleaning up:
3187 ======================================================================
3188 ERROR: testBackengine (SimExTest.Calculators.S2EReconstructionTest.S2EReconstructionTest)
3189 Test that we can start a test calculation.
3190 ----------------------------------------------------------------------
3191 Traceback (most recent call last):
3192 File "/gpfs/exfel/data/user/juncheng/simex-branch/Tests/python/unittest/SimExTest/Calculators/S2EReconstructionTest.py", line 152, in testBackengine
3193 status = analyzer.backengine()
3194 File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Calculators/S2EReconstruction.py", line 129, in backengine
3195 emc_status = self.__emc.backengine()
3196 File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Calculators/EMCOrientation.py", line 220, in backengine
3197 mpicommand=ParallelUtilities.prepareMPICommandArguments(np)
3198 File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 191, in prepareMPICommandArguments
3199 mpi_cmd+=_getVendorSpecificMPIArguments(version, threads_per_task)
3200 File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 149, in _getVendorSpecificMPIArguments
3201 raise IOError( "Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or "
3202 OSError: Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or provide backengine_mpicommand calculator parameter
Got a bunch of errors. I am little bit overwhelmed. @CFGrote suggestions about where to start? The test log was uploaded here. TestError.log
Problem:
For /gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py(132)_getMPIVersionInfo()
, if we are using IntelMPI, then it cannot be identified, and return None
. This None
will trigger an error at "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 151
if version == None:
raise IOError( "Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or "
"provide backengine_mpicommand calculator parameter")
The intel mpirun --version
output:
b'Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)\nCopyright 2003-2019, Intel Corporation.\n'
Problem: For
/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py(132)_getMPIVersionInfo()
, if we are using IntelMPI, then it cannot be identified, and returnNone
. ThisNone
will trigger an error at"/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 151
if version == None: raise IOError( "Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or " "provide backengine_mpicommand calculator parameter")
The intel
mpirun --version
output: b'Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)\nCopyright 2003-2019, Intel Corporation.\n'
in those tests that fail due to this mpi issue, let's set the mpicommand manually or make sure that we run on openmpi, not intel
I tried export SIMEX_MPICOMMAND=mpirun
, but still got the same error. Should I use another command?
export SIMEX_MPICOMMAND=mpirun -np
Errors from openmpi run TestError2.log
regarding the failures in PlasmaXRTS... : add /gpfs/exfel/data/group/spb-sfx/spb_simulation/simex/bin to your $PATH
FAIL: test_construction_exceptions (SimExTest.Parameters.SingFELPhotonDiffractorParametersTest.SingFELPhotonDiffractorParametersTest)
Test that exceptions are thrown if parameters not sane.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/gpfs/exfel/data/user/juncheng/simex-branch/Tests/python/unittest/SimExTest/Parameters/SingFELPhotonDiffractorParametersTest.py", line 340, in test_construction_exceptions
self.assertTrue(raises)
AssertionError: False is not true
SLURM SLURM_JOB_NUM_NODES are not like 40x(2),20x(1),10x(10), but are like 40(x2),20(x1),10(x10). In Parallel Utilities Test, all formats are changed from 40x(2) to 40(x2).
I will deal with pysingfel test tomorrow.
Carsten:
Jun:
Tests/python/unittest/Test.py:47
Error got when testing with Intel MPI, command python SingFELPhotonDiffractorTest.py SingFELPhotonDiffractorTest.testBackengine
Traceback (most recent call last):
File "/gpfs/exfel/data/user/juncheng/simex-branch/bin/radiationDamageMPI", line 9, in <module>
main(parameters=parameters)
File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 24, in main
master_diffract(comm, parameters)
File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 53, in master_diffract
MakeOneDiffr(myQuaternions, ntask, parameters, outputName)
File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamage.py", line 180, in MakeOneDiffr
saveAsDiffrOutFile(outputName, inputName, counter, detector_counts, detector_intensity, quaternion, det, beam)
File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/FileIO.py", line 48, in saveAsDiffrOutFile
f.create_dataset(group_name + 'data', data=detector_counts)
File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 119, in create_dataset
self[name] = dset
File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 287, in __setitem__
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)
Error got when testing with Intel MPI, command
python SingFELPhotonDiffractorTest.py SingFELPhotonDiffractorTest.testBackengine
Traceback (most recent call last): File "/gpfs/exfel/data/user/juncheng/simex-branch/bin/radiationDamageMPI", line 9, in <module> main(parameters=parameters) File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 24, in main master_diffract(comm, parameters) File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 53, in master_diffract MakeOneDiffr(myQuaternions, ntask, parameters, outputName) File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamage.py", line 180, in MakeOneDiffr saveAsDiffrOutFile(outputName, inputName, counter, detector_counts, detector_intensity, quaternion, det, beam) File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/FileIO.py", line 48, in saveAsDiffrOutFile f.create_dataset(group_name + 'data', data=detector_counts) File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 119, in create_dataset self[name] = dset File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 287, in __setitem__ h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 202, in h5py.h5o.link RuntimeError: Unable to create link (name already exists)
try to remove all previously generated hdf output files and then run the test again
Yes, I have already tried removing (actually they were always removed by the test script), but it is still happening for my intel-mpi build. For open-mpi build, no errors happened.
Errors got for crystFEL within this branch: https://github.com/PaNOSC-ViNYL/SimEx/commit/dc0f71aef91619f67aca4f31aa9e18d9726f1bb7
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_parse@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_generate_random@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_unparse@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_copy@UUID_1.0'
collect2: error: ld returned 1 exit status
make[5]: *** [hdfsee] Error 1
make[4]: *** [CMakeFiles/hdfsee.dir/all] Error 2
make[3]: *** [all] Error 2
make[2]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/crystfel-prefix/src/crystfel-stamp/crystfel-build] Error 2
make[1]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/CMakeFiles/crystfel.dir/all] Error 2
make: *** [all] Error 2
I think you might want libuuid-devel
conda install libuuid
solved the above problem.
A new one:
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_spawn_async_with_fds'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_utf8_validate_len'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_source_set_dispose_function'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_ref_count_dec'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_unix_get_passwd_entry'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_join'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_split_with_user'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_variant_type_string_get_depth_'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_canonicalize_filename'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_split_network'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_is_valid'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_file_set_contents_full'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_atomic_ref_count_init'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_atomic_ref_count_dec'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_ref_count_init'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_atomic_ref_count_inc'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_ref_count_inc'
Try adding glib-devel
(may even be called glib2-devel
)
Got mpi issues again for SingFEL with openmpi.
At SingFELPhotonDiffractorTest.testBackengineInputFile,
parameters = SingFELPhotonDiffractorParameters(
uniform_rotation=True,
calculate_Compton=False,
slice_interval=100,
number_of_slices=2,
pmi_start_ID=1,
pmi_stop_ID=1,
number_of_diffraction_patterns= 2,
detector_geometry= self.detector_geometry,
forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2',
)
Got
[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x
[mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters
Do we have to include the -x OMP_NUM_THREADS=2
for testing? @CFGrote
Genesis errors are due to unmatched running parameters. Need more discussions to fix this.
Got mpi issues again for SingFEL with openmpi.
At SingFELPhotonDiffractorTest.testBackengineInputFile,
parameters = SingFELPhotonDiffractorParameters( uniform_rotation=True, calculate_Compton=False, slice_interval=100, number_of_slices=2, pmi_start_ID=1, pmi_stop_ID=1, number_of_diffraction_patterns= 2, detector_geometry= self.detector_geometry, forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2', )
Got
[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x [mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error [mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array [mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments [mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters
Do we have to include the
-x OMP_NUM_THREADS=2
for testing? @CFGrote
This error was located. Conda has installed mpich instead of openmpi for me, and this option -x
is not supported by the mpirun
provided by mpich.
A note: https://github.com/PySlurm/pyslurm can be the slurm interface
For crystalFEL problem, I tried glib2-devel-cos6-x86_64, and glib, but both were failed. It looks like that lib in conda is different from what is required. I tried to use pip only, and the installation is successful. Is there any way to make cmake not use the glib from conda?
Got mpi issues again for SingFEL with openmpi. At SingFELPhotonDiffractorTest.testBackengineInputFile,
parameters = SingFELPhotonDiffractorParameters( uniform_rotation=True, calculate_Compton=False, slice_interval=100, number_of_slices=2, pmi_start_ID=1, pmi_stop_ID=1, number_of_diffraction_patterns= 2, detector_geometry= self.detector_geometry, forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2', )
Got
[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x [mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error [mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array [mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments [mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters
Do we have to include the
-x OMP_NUM_THREADS=2
for testing? @CFGroteThis error was located. Conda has installed mpich instead of openmpi for me, and this option
-x
is not supported by thempirun
provided by mpich.
so this is fixed?
For crystalFEL problem, I tried glib2-devel-cos6-x86_64, and glib, but both were failed. It looks like that lib in conda is different from what is required. I tried to use pip only, and the installation is successful. Is there any way to make cmake not use the glib from conda?
don't know. i've had similar issues before and what i usually did was to leave the conda environment, run make again and that ofte works. it's annoying to not have a real solution but maybe we have to live with it for now.
what i don't get is why in my case installing crystfel had no problem.
Got mpi issues again for SingFEL with openmpi. At SingFELPhotonDiffractorTest.testBackengineInputFile,
parameters = SingFELPhotonDiffractorParameters( uniform_rotation=True, calculate_Compton=False, slice_interval=100, number_of_slices=2, pmi_start_ID=1, pmi_stop_ID=1, number_of_diffraction_patterns= 2, detector_geometry= self.detector_geometry, forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2', )
Got
[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x [mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error [mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array [mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments [mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters
Do we have to include the
-x OMP_NUM_THREADS=2
for testing? @CFGroteThis error was located. Conda has installed mpich instead of openmpi for me, and this option
-x
is not supported by thempirun
provided by mpich.so this is fixed?
For openmpi, yes.
I had to remove all mpi4py from EMCOrientation. it does not matter too much as only the reading of files was parallelized on, not the actual EMC calculation.
had to fix oceloc.adapters.genesis in line 1491. must make a PR!
Errors got for crystFEL within this branch: PaNOSC-ViNYL/SimEx@dc0f71a
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_parse@UUID_1.0' /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_generate_random@UUID_1.0' /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_unparse@UUID_1.0' /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_copy@UUID_1.0' collect2: error: ld returned 1 exit status make[5]: *** [hdfsee] Error 1 make[4]: *** [CMakeFiles/hdfsee.dir/all] Error 2 make[3]: *** [all] Error 2 make[2]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/crystfel-prefix/src/crystfel-stamp/crystfel-build] Error 2 make[1]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/CMakeFiles/crystfel.dir/all] Error 2 make: *** [all] Error 2
i only had to go into build/ and say make
once more, than it went through.
TNSAIonMatterInteractor.py cannot pass the test because sdf.read
doesn't exist.
Will take it out of the test at this moment until Zsolt fixes the problem.