AntaresSimulatorTeam / AntaREST

API REST and WebUI for Antares_Simulator
Apache License 2.0
10 stars 6 forks source link

Random test failure on windows (permission denied) #2124

Open sylvlecl opened 1 month ago

sylvlecl commented 1 month ago

Description

Study upgrade tests randomly fail on windows with a "PermissionError", see logs:

[2024-08-20 12:19:51,826] [5104] [antarest.study.web.studies_blueprint] - ca9aa88b-491a-4ad2-bf50-f90590807397 - AnyIO worker thread - testclient - 2 - INFO - Upgrade study 405f1e67-2bb0-429d-9185-b189adaceb56 to the version 860
[2024-08-20 12:19:51,873] [5104] [antarest.core.tasks.service] - None - taskjob__0 - None - None - INFO - Starting task 769c0358-81cc-4d26-a639-fd212ccb2834
[2024-08-20 12:19:51,889] [5104] [antarest.core.tasks.service] - None - taskjob__0 - None - None - INFO - Task 769c0358-81cc-4d26-a639-fd212ccb2834 set to RUNNING
[2024-08-20 12:19:54,718] [5104] [antarest.study.service] - None - Watcher - None - None - INFO - Study at C:\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-runneradmin\\pytest-0\\test_lifecycle__nominal_860_va0\\ext_workspace\\~9o2ibcwj.upgrade.tmp appears on disk and will be added as 34938441-a845-4acc-a1c6-c3e6eb055911
[2024-08-20 12:19:54,734] [5104] [antarest.study.storage.rawstudy.model.filesystem.factory] - None - Watcher - None - None - INFO - \U0001f3d7 Creating a study by reading the configuration from the directory 'C:\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-runneradmin\\pytest-0\\test_lifecycle__nominal_860_va0\\ext_workspace\\~9o2ibcwj.upgrade.tmp'...
[2024-08-20 12:19:54,780] [5104] [antarest.study.storage.rawstudy.model.filesystem.factory] - None - Watcher - None - None - INFO - Study  config built in 0.047s
[2024-08-20 12:19:54,780] [5104] [antarest.study.storage.abstract_storage_service] - None - Watcher - None - None - INFO - Reading additional data from files for study 
[2024-08-20 12:19:54,796] [5104] [antarest.study.service] - None - Watcher - None - None - WARNING - Skipping study format error analysis
[2024-08-20 12:19:54,859] [5104] [antarest.core.tasks.service] - None - taskjob__0 - None - None - ERROR - Task 769c0358-81cc-4d26-a639-fd212ccb2834 failed: Unhandled exception [WinError 5] Access is denied: 'C:\\\\Users\\\\runneradmin\\\\AppData\\\\Local\\\\Temp\\\\pytest-of-runneradmin\\\\pytest-0\\\\test_lifecycle__nominal_860_va0\\\\ext_workspace\\\\~9o2ibcwj.upgrade.tmp\\\\input' -> 'C:\\\\Users\\\\runneradmin\\\\AppData\\\\Local\\\\Temp\\\\pytest-of-runneradmin\\\\pytest-0\\\\test_lifecycle__nominal_860_va0\\\\ext_workspace\\\\STA-mini\\\\input'
Traceback (most recent call last):
  File "D:\\a\\AntaREST\\AntaREST\\antarest\\core\\tasks\\service.py", line 367, in _run_task
    result = callback(TaskJobLogRecorder(task_id, session=db.session))
  File "D:\\a\\AntaREST\\AntaREST\\antarest\\study\\service.py", line 231, in run_task
    self._upgrade_study()
  File "D:\\a\\AntaREST\\AntaREST\\antarest\\study\\service.py", line 201, in _upgrade_study
    upgrade_study(study_path, target_version)
  File "D:\\a\\AntaREST\\AntaREST\\antarest\\study\\storage\\study_upgrader\\__init__.py", line 96, in upgrade_study
    _replace_safely_original_files(files_to_retrieve, study_path, tmp_dir)
  File "D:\\a\\AntaREST\\AntaREST\\antarest\\study\\storage\\study_upgrader\\__init__.py", line 261, in _replace_safely_original_files
    (tmp_path / path).rename(original_path)
  File "c:\\hostedtoolcache\\windows\\python\\3.8.10\\x64\\lib\\pathlib.py", line 1359, in rename
    self._accessor.rename(self, target)
PermissionError: [WinError 5] Access is denied: 'C:\\\\Users\\\\runneradmin\\\\AppData\\\\Local\\\\Temp\\\\pytest-of-runneradmin\\\\pytest-0\\\\test_lifecycle__nominal_860_va0\\\\ext_workspace\\\\~9o2ibcwj.upgrade.tmp\\\\input' -> 'C:\\\\Users\\\\runneradmin\\\\AppData\\\\Local\\\\Temp\\\\pytest-of-runneradmin\\\\pytest-0\\\\test_lifecycle__nominal_860_va0\\\\ext_workspace\\\\STA-mini\\\\input'

This seems to be an issue with low level OS functions not always releasing soon enough the handle on the renamed file. See this analysis and workaround proposition: https://github.com/conan-io/conan/issues/6560#issuecomment-661679853

We should implement this workaround to ensure correct behaviour of the app and stability of the tests.

sylvlecl commented 1 month ago

Actually a duplicate of https://github.com/AntaresSimulatorTeam/AntaREST/issues/1416

sylvlecl commented 1 month ago

As previously identified by the team, in our case it's likely that the watcher service is causing those issues, see logs above where we see the watcher scanning in close timestamps from our upgrade.

Starting from here, we can imagine 2 solutions :

  1. the same as above, based on retries
  2. another one based on study locks (file lock). Feasibility here is not obvious

So at least in a first time, we can implement the retry mechanism.

sylvlecl commented 1 month ago

More digging again: we actually see in the logs that the watcher scans a temporary study, which it should ignore.

This is another bug that we should fix, which would probably solve the isues we have in unit tests, although not removing the small risk that the situation arises in real life on windows environments.