PaNOSC-ViNYL / ViNYL-project

This repository keeps track of tasks, milestones, deliverables of workpackage 5 in panosc.
Apache License 2.0
5 stars 5 forks source link

Prepare a release for simex for D5.2 #43

Closed CFGrote closed 3 years ago

CFGrote commented 3 years ago
JunCEEE commented 3 years ago

Branch checking list

branch from Jun

branch from Carsten

others

JunCEEE commented 3 years ago

First day summary

JunCEEE commented 3 years ago

Prepare release

install.sh:

Have a more comprehensive README.md:

Default modules to enable:

Documentation:

JunCEEE commented 3 years ago

Testing cleaning up:

3187 ======================================================================
3188 ERROR: testBackengine (SimExTest.Calculators.S2EReconstructionTest.S2EReconstructionTest)
3189 Test that we can start a test calculation.
3190 ----------------------------------------------------------------------
3191 Traceback (most recent call last):
3192   File "/gpfs/exfel/data/user/juncheng/simex-branch/Tests/python/unittest/SimExTest/Calculators/S2EReconstructionTest.py", line 152, in testBackengine
3193     status = analyzer.backengine()
3194   File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Calculators/S2EReconstruction.py", line 129, in backengine
3195     emc_status = self.__emc.backengine()
3196   File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Calculators/EMCOrientation.py", line 220, in backengine
3197     mpicommand=ParallelUtilities.prepareMPICommandArguments(np)
3198   File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 191, in prepareMPICommandArguments
3199     mpi_cmd+=_getVendorSpecificMPIArguments(version, threads_per_task)
3200   File "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 149, in _getVendorSpecificMPIArguments
3201     raise IOError( "Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or "
3202 OSError: Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or provide backengine_mpicommand calculator parameter
JunCEEE commented 3 years ago

Got a bunch of errors. I am little bit overwhelmed. @CFGrote suggestions about where to start? The test log was uploaded here. TestError.log

JunCEEE commented 3 years ago

Problem: For /gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py(132)_getMPIVersionInfo(), if we are using IntelMPI, then it cannot be identified, and return None. This None will trigger an error at "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 151

 if version == None:
        raise IOError( "Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or "
                       "provide backengine_mpicommand calculator parameter")

The intel mpirun --version output: b'Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)\nCopyright 2003-2019, Intel Corporation.\n'

CFGrote commented 3 years ago

Problem: For /gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py(132)_getMPIVersionInfo(), if we are using IntelMPI, then it cannot be identified, and return None. This None will trigger an error at "/gpfs/exfel/data/user/juncheng/simex-branch/Sources/python/SimEx/Utilities/ParallelUtilities.py", line 151

 if version == None:
        raise IOError( "Could not determine MPI vendor/version. Set SIMEX_MPICOMMAND or "
                       "provide backengine_mpicommand calculator parameter")

The intel mpirun --version output: b'Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)\nCopyright 2003-2019, Intel Corporation.\n'

in those tests that fail due to this mpi issue, let's set the mpicommand manually or make sure that we run on openmpi, not intel

JunCEEE commented 3 years ago

I tried export SIMEX_MPICOMMAND=mpirun, but still got the same error. Should I use another command?

export SIMEX_MPICOMMAND=mpirun -np

JunCEEE commented 3 years ago

Errors from openmpi run TestError2.log

CFGrote commented 3 years ago

regarding the failures in PlasmaXRTS... : add /gpfs/exfel/data/group/spb-sfx/spb_simulation/simex/bin to your $PATH

JunCEEE commented 3 years ago
FAIL: test_construction_exceptions (SimExTest.Parameters.SingFELPhotonDiffractorParametersTest.SingFELPhotonDiffractorParametersTest)
Test that exceptions are thrown if parameters not sane.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/exfel/data/user/juncheng/simex-branch/Tests/python/unittest/SimExTest/Parameters/SingFELPhotonDiffractorParametersTest.py", line 340, in test_construction_exceptions
    self.assertTrue(raises)
AssertionError: False is not true
JunCEEE commented 3 years ago

SLURM SLURM_JOB_NUM_NODES are not like 40x(2),20x(1),10x(10), but are like 40(x2),20(x1),10(x10). In Parallel Utilities Test, all formats are changed from 40x(2) to 40(x2).

JunCEEE commented 3 years ago

I will deal with pysingfel test tomorrow.

JunCEEE commented 3 years ago

Carsten:

Jun:

JunCEEE commented 3 years ago
JunCEEE commented 3 years ago

Error got when testing with Intel MPI, command python SingFELPhotonDiffractorTest.py SingFELPhotonDiffractorTest.testBackengine

Traceback (most recent call last):
  File "/gpfs/exfel/data/user/juncheng/simex-branch/bin/radiationDamageMPI", line 9, in <module>
    main(parameters=parameters)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 24, in main
    master_diffract(comm, parameters)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 53, in master_diffract
    MakeOneDiffr(myQuaternions, ntask, parameters, outputName)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamage.py", line 180, in MakeOneDiffr
    saveAsDiffrOutFile(outputName, inputName, counter, detector_counts, detector_intensity, quaternion, det, beam)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/FileIO.py", line 48, in saveAsDiffrOutFile
    f.create_dataset(group_name + 'data', data=detector_counts)
  File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 119, in create_dataset
    self[name] = dset
  File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 287, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)
CFGrote commented 3 years ago

Error got when testing with Intel MPI, command python SingFELPhotonDiffractorTest.py SingFELPhotonDiffractorTest.testBackengine

Traceback (most recent call last):
  File "/gpfs/exfel/data/user/juncheng/simex-branch/bin/radiationDamageMPI", line 9, in <module>
    main(parameters=parameters)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 24, in main
    master_diffract(comm, parameters)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamageMPI.py", line 53, in master_diffract
    MakeOneDiffr(myQuaternions, ntask, parameters, outputName)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/radiationDamage.py", line 180, in MakeOneDiffr
    saveAsDiffrOutFile(outputName, inputName, counter, detector_counts, detector_intensity, quaternion, det, beam)
  File "/gpfs/exfel/data/user/juncheng/simex-branch/lib/python3.7/site-packages/pysingfel/FileIO.py", line 48, in saveAsDiffrOutFile
    f.create_dataset(group_name + 'data', data=detector_counts)
  File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 119, in create_dataset
    self[name] = dset
  File "/gpfs/exfel/data/user/juncheng/miniconda3/envs/simex-branch/lib/python3.7/site-packages/h5py/_hl/group.py", line 287, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)

try to remove all previously generated hdf output files and then run the test again

JunCEEE commented 3 years ago

Yes, I have already tried removing (actually they were always removed by the test script), but it is still happening for my intel-mpi build. For open-mpi build, no errors happened.

JunCEEE commented 3 years ago

Errors got for crystFEL within this branch: https://github.com/PaNOSC-ViNYL/SimEx/commit/dc0f71aef91619f67aca4f31aa9e18d9726f1bb7

/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_parse@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_generate_random@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_unparse@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_copy@UUID_1.0'
collect2: error: ld returned 1 exit status
make[5]: *** [hdfsee] Error 1
make[4]: *** [CMakeFiles/hdfsee.dir/all] Error 2
make[3]: *** [all] Error 2
make[2]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/crystfel-prefix/src/crystfel-stamp/crystfel-build] Error 2
make[1]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/CMakeFiles/crystfel.dir/all] Error 2
make: *** [all] Error 2
cloudbustinguk commented 3 years ago

I think you might want libuuid-devel

JunCEEE commented 3 years ago

conda install libuuid solved the above problem.

A new one:

miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_spawn_async_with_fds'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_utf8_validate_len'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_source_set_dispose_function'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_ref_count_dec'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_unix_get_passwd_entry'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_join'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_split_with_user'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_variant_type_string_get_depth_'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_canonicalize_filename'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_split_network'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_uri_is_valid'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_file_set_contents_full'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_atomic_ref_count_init'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_atomic_ref_count_dec'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_ref_count_init'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_atomic_ref_count_inc'
miniconda3/envs/simex-openmpi/lib/libgio-2.0.so.0: undefined reference to `g_ref_count_inc'
cloudbustinguk commented 3 years ago

Try adding glib-devel (may even be called glib2-devel)

JunCEEE commented 3 years ago

Got mpi issues again for SingFEL with openmpi.

At SingFELPhotonDiffractorTest.testBackengineInputFile,

        parameters = SingFELPhotonDiffractorParameters(
                     uniform_rotation=True,
                     calculate_Compton=False,
                     slice_interval=100,
                     number_of_slices=2,
                     pmi_start_ID=1,
                     pmi_stop_ID=1,
                     number_of_diffraction_patterns= 2,
                     detector_geometry= self.detector_geometry,
                     forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2',
                     )

Got

[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x
[mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters

Do we have to include the -x OMP_NUM_THREADS=2 for testing? @CFGrote

JunCEEE commented 3 years ago

Genesis errors are due to unmatched running parameters. Need more discussions to fix this.

https://github.com/PaNOSC-ViNYL/SimEx/issues/218

JunCEEE commented 3 years ago

Got mpi issues again for SingFEL with openmpi.

At SingFELPhotonDiffractorTest.testBackengineInputFile,

        parameters = SingFELPhotonDiffractorParameters(
                     uniform_rotation=True,
                     calculate_Compton=False,
                     slice_interval=100,
                     number_of_slices=2,
                     pmi_start_ID=1,
                     pmi_stop_ID=1,
                     number_of_diffraction_patterns= 2,
                     detector_geometry= self.detector_geometry,
                     forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2',
                     )

Got

[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x
[mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters

Do we have to include the -x OMP_NUM_THREADS=2 for testing? @CFGrote

This error was located. Conda has installed mpich instead of openmpi for me, and this option -x is not supported by the mpirun provided by mpich.

JunCEEE commented 3 years ago

A note: https://github.com/PySlurm/pyslurm can be the slurm interface

JunCEEE commented 3 years ago

For crystalFEL problem, I tried glib2-devel-cos6-x86_64, and glib, but both were failed. It looks like that lib in conda is different from what is required. I tried to use pip only, and the installation is successful. Is there any way to make cmake not use the glib from conda?

CFGrote commented 3 years ago

Got mpi issues again for SingFEL with openmpi. At SingFELPhotonDiffractorTest.testBackengineInputFile,

        parameters = SingFELPhotonDiffractorParameters(
                     uniform_rotation=True,
                     calculate_Compton=False,
                     slice_interval=100,
                     number_of_slices=2,
                     pmi_start_ID=1,
                     pmi_stop_ID=1,
                     number_of_diffraction_patterns= 2,
                     detector_geometry= self.detector_geometry,
                     forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2',
                     )

Got

[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x
[mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters

Do we have to include the -x OMP_NUM_THREADS=2 for testing? @CFGrote

This error was located. Conda has installed mpich instead of openmpi for me, and this option -x is not supported by the mpirun provided by mpich.

so this is fixed?

CFGrote commented 3 years ago

For crystalFEL problem, I tried glib2-devel-cos6-x86_64, and glib, but both were failed. It looks like that lib in conda is different from what is required. I tried to use pip only, and the installation is successful. Is there any way to make cmake not use the glib from conda?

don't know. i've had similar issues before and what i usually did was to leave the conda environment, run make again and that ofte works. it's annoying to not have a real solution but maybe we have to live with it for now.

what i don't get is why in my case installing crystfel had no problem.

JunCEEE commented 3 years ago

Got mpi issues again for SingFEL with openmpi. At SingFELPhotonDiffractorTest.testBackengineInputFile,

        parameters = SingFELPhotonDiffractorParameters(
                     uniform_rotation=True,
                     calculate_Compton=False,
                     slice_interval=100,
                     number_of_slices=2,
                     pmi_start_ID=1,
                     pmi_stop_ID=1,
                     number_of_diffraction_patterns= 2,
                     detector_geometry= self.detector_geometry,
                     forced_mpi_command='mpirun -np 2 -x OMP_NUM_THREADS=2',
                     )

Got

[mpiexec@max-display001.desy.de] match_arg (utils/args/args.c:163): unrecognized argument x
[mpiexec@max-display001.desy.de] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@max-display001.desy.de] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@max-display001.desy.de] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@max-display001.desy.de] main (ui/mpich/mpiexec.c:148): error parsing parameters

Do we have to include the -x OMP_NUM_THREADS=2 for testing? @CFGrote

This error was located. Conda has installed mpich instead of openmpi for me, and this option -x is not supported by the mpirun provided by mpich.

so this is fixed?

For openmpi, yes.

CFGrote commented 3 years ago

I had to remove all mpi4py from EMCOrientation. it does not matter too much as only the reading of files was parallelized on, not the actual EMC calculation.

CFGrote commented 3 years ago

had to fix oceloc.adapters.genesis in line 1491. must make a PR!

CFGrote commented 3 years ago

Errors got for crystFEL within this branch: PaNOSC-ViNYL/SimEx@dc0f71a

/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_parse@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_generate_random@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_unparse@UUID_1.0'
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libfontconfig.so: undefined reference to `uuid_copy@UUID_1.0'
collect2: error: ld returned 1 exit status
make[5]: *** [hdfsee] Error 1
make[4]: *** [CMakeFiles/hdfsee.dir/all] Error 2
make[3]: *** [all] Error 2
make[2]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/crystfel-prefix/src/crystfel-stamp/crystfel-build] Error 2
make[1]: *** [Modules/Diffractors/CrystFELPhotonDiffractor/CMakeFiles/crystfel.dir/all] Error 2
make: *** [all] Error 2

i only had to go into build/ and say make once more, than it went through.

JunCEEE commented 3 years ago

TNSAIonMatterInteractor.py cannot pass the test because sdf.read doesn't exist. Will take it out of the test at this moment until Zsolt fixes the problem.

JunCEEE commented 3 years ago
JunCEEE commented 3 years ago