gammasim / simtools

Tools and applications for the Simulation System of the CTA Observatory.
https://gammasim.github.io/simtools
BSD 3-Clause "New" or "Revised" License
10 stars 1 forks source link

Integration tests for simulate_prod fail when running in parallel #1209

Closed GernotMaier closed 2 days ago

GernotMaier commented 1 month ago

Issue noticed by Tobias during work on #1137.

Running the simulate_prod integration tests in parallel makes them fail.

The error messages are typically:

INFO::simulator(l450)::get_file_list::Getting list of hist files
Traceback (most recent call last):
  File "/workdir/env/bin/simtools-simulate-prod", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/workdir/external/simtools/simtools/applications/simulate_prod.py", line 201, in main
    pack_for_register(logger, simulator, args_dict)
  File "/workdir/external/simtools/simtools/applications/simulate_prod.py", line 166, in pack_for_register
    tar.add(file_to_tar, arcname=Path(file_to_tar).name)
  File "/usr/lib64/python3.11/tarfile.py", line 2171, in add
    tarinfo = self.gettarinfo(name, arcname)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/tarfile.py", line 2044, in gettarinfo
    statres = os.lstat(name)
              ^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-root/pytest-17/popen-gw2/test-data2/simtools-simulate-prod-gamma_20_deg_az0deg_south_check_output/simtools-output/simtel/logs/run000001_gamma_za20deg_azm000deg_South_alpha_check_output.hdata.zst'

Comparing log files from sim_telarray, it becomes apparent that for the failing runs sim_telarray was not executed correctly. So above error on missing histogram files is because these files are never generated.

An inspection of configuration files (for simtools, corsika, sim_telarray) or run scripts do not reveal anything notable. Paths are set everywhere consistently to the right temporary directory generated by pytest.

Propose to analyse if sim_telarray array uses some temporary directories for all runs, which might interfer when run in parallel?

orelgueta commented 1 month ago

There are two unrelated issues which pop up when running in parallel. One of them is related to creating the tarball and I think it is solved now in the fix_parallel_running branch. I will take a look at the other issue as well (at some point).

orelgueta commented 1 month ago

Actually, after merging the main into fix_parallel_running after #1137 was merged to main, I cannot reproduce the first problem. Can you @GernotMaier, @tobiaskleiner please check and see if you can reproduce the problem in fix_parallel_running? If not, I will open a PR with this small fix.

I will anyway look into the output files from productions in the future because I want to change the names to be a bit more consistent.

GernotMaier commented 2 days ago

Requires cross check with @EshitaJoshi that changes implemented in PR #1211 fixes this issue.

EshitaJoshi commented 2 days ago

I can confirm the tests all pass for me now!

I ran: pytest -n auto --no-cov tests/integration_tests/ --count 5 -k simtools-simulate-prod

GernotMaier commented 2 days ago

Good - then let's close this issue.