KITPraktomatTeam / Praktomat

quality control for programming assignments
http://pp.ipd.kit.edu/projects/praktomat/praktomat.php
GNU General Public License v2.0
46 stars 22 forks source link

TaskAdmin-Actions : check_multiple // Fatal Python error : _PyInterpreterState_DeleteExceptMain: not main interpreter : solved! #353

Open ifrh opened 2 years ago

ifrh commented 2 years ago

Thae following observed problem does not occure, if I start Praktomat via manage-local.py runserverfrom command line. But sadly runnig via wsgi on apache is affected.

After I had set variable NUMBER_OF_TASKS_TO_BE_CHECKED_IN_PARALLEL inside Praktomat/src/settings/local.py is set to another value than 1 , I saw a browser error message using some TaskAdmin-Actions on a task , which has a model solution and exactly two student solutions".

The problematic taskAdmin methods were run_all_checkers and run_all_uploadtime_checkers_on_all ; both methods are calling check_multiple from checker.basemodels https://github.com/KITPraktomatTeam/Praktomat/blob/92d1cf50157426e7aa3cd20e665bfb31ffe2f25a/src/checker/basemodels.py#L349-L358

The browser time out I would ignore, if I had tested some hundred task solutions, but with less than 5 solutions, there shouldn't be any time problem. The browser time out is a known situation, e.g. if a large number of solutions are checked, work is carried out in the background despite the message. But this time no solutions were checked in the background.

Adding some printing statements around pool.map(check_it, solutions, 1) in check_muliple which were written via wsgi into apache2/error.log, I found that this pool command did not end. It behaves like an infinity loop. I saw that in Apache's error log the output of my print command befor calling pool.map was written. But instead of the output from print after the call to pool.map , I saw multiple gigabytes of entries like:

Thread 0x00007faff40a3780 (most recent call first):
  <no Python frame>
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
Python runtime state: initialized

Current thread 0x00007fafe07fd640 (most recent call first):
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 66 in _launch
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.10/multiprocessing/context.py", line 277 in _Popen
  File "/usr/lib/python3.10/multiprocessing/process.py", line 121 in start
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 326 in _repopulate_pool_static
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 337 in _maintain_pool
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 513 in _handle_workers
  File "/usr/lib/python3.10/threading.py", line 946 in run
  File "/usr/lib/python3.10/threading.py", line 1009 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 966 in _bootstrap
Gateway Timeout

The gateway did not receive a timely response from the upstream server or application.
ifrh commented 2 years ago

since error message is identical , perhaps the reason is identical, too: see https://github.com/GrahamDumpleton/mod_wsgi/issues/765

ifrh commented 2 years ago

oh indeed ... the problem depends on contents of apache macro praktomat:

https://github.com/KITPraktomatTeam/Praktomat/blob/92d1cf50157426e7aa3cd20e665bfb31ffe2f25a/documentation/apache_praktomat_wsgi.conf#L20

The problem seems to be gone, if I change that line on our server, where that apache macro was copied to file sites-enabled/default-ssl.conf, to

WSGIScriptAlias /$id $path/Praktomat/wsgi/praktomat.wsgi process-group=local_$id application-group=%{GLOBAL}

and also add outside of all VirtualHost configurations:

WSGIRestrictEmbedded On

With that changes I could set variable NUMBER_OF_TASKS_TO_BE_CHECKED_IN_PARALLEL inside Praktomat/src/settings/local.py to another value than 1.