Closed victor051 closed 5 years ago
In the "Admin / Advanced" webpage, there is a "Reset Service / Workflow statuses." button, just click on it and all statuses will be resetted.
BUT if you end up in that situation, it means something wrong happen, and you should check in the logs for an exception. Can you do that and paste it here ?
One reason I've seen it happen is when you're running a service with multiprocessing enabled, but your database (SQLite) does not support concurrency, so you get a database is locked
exception.
If that is your case, there are two solutions:
I reproduced this problem and got the log. I found that one of the devices used the wrong driver and caused an error.But I think that the process should not be stuck. ===============================logs=================================
04-12-2019 14:20:18 ERROR Job "scheduler_job (trigger: date[2019-04-12 14:19:45 CST], next run at: 2019-04-12 14:19:45 CST)" raised an exception Traceback (most recent call last): File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1224, in _execute_context cursor, statement, parameters, context File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 752, in do_executemany cursor.executemany(statement, parameters) psycopg2.errors.DeadlockDetected: deadlock detected DETAIL: Process 6924 waits for ShareLock on transaction 2701; blocked by process 6923. Process 6923 waits for ShareLock on transaction 2703; blocked by process 6929. Process 6929 waits for ExclusiveLock on tuple (1,17) of relation 16497 of database 16384; blocked by process 6924. HINT: See server log for query details. CONTEXT: while updating tuple (1,17) in relation "Device"
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job retval = job.func(*job.args, *job.kwargs) File "/home/nemo/eNMS/eNMS/automation/functions.py", line 28, in scheduler_job results, now = job.try_run(targets=targets, payload=payload) File "/home/nemo/eNMS/eNMS/automation/models.py", line 175, in try_run attempt = self.run(payload, job_from_workflow_targets, targets, workflow) File "/home/nemo/eNMS/eNMS/automation/models.py", line 270, in run [(device, results, payload, workflow) for device in targets], File "/usr/local/python3/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/python3/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/local/python3/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(args, *kwds)) File "/usr/local/python3/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/nemo/eNMS/eNMS/automation/models.py", line 249, in device_run device_result = self.get_results(payload, device, workflow) File "/home/nemo/eNMS/eNMS/automation/models.py", line 244, in get_results return results File "/usr/local/python3/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/home/nemo/eNMS/eNMS/functions.py", line 206, in session_scope raise e File "/home/nemo/eNMS/eNMS/functions.py", line 202, in session_scope session.commit() File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 1026, in commit self.transaction.commit() File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 493, in commit self._prepare_impl() File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 472, in _prepare_impl self.session.flush() File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2451, in flush self._flush(objects) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2589, in _flush transaction.rollback(_capture_exception=True) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in exit compat.reraise(exc_type, exc_value, exc_tb) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise raise value File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2549, in _flush flush_context.execute() File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute rec.execute(self) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute uow, File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 236, in save_obj update, File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 978, in _emit_update_statements statement, multiparams File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 988, in execute return meth(self, multiparams, params) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement distilled_params, File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context e, statement, parameters, cursor, context File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception util.raise_from_cause(sqlalchemy_exception, exc_info) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 128, in reraise raise value.with_traceback(tb) File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1224, in _execute_context cursor, statement, parameters, context File "/usr/local/python3/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 752, in do_executemany cursor.executemany(statement, parameters) sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected DETAIL: Process 6924 waits for ShareLock on transaction 2701; blocked by process 6923. Process 6923 waits for ShareLock on transaction 2703; blocked by process 6929. Process 6929 waits for ExclusiveLock on tuple (1,17) of relation 16497 of database 16384; blocked by process 6924. HINT: See server log for query details. CONTEXT: while updating tuple (1,17) in relation "Device"
[SQL: UPDATE "Device" SET last_runtime=%(last_runtime)s WHERE "Device".id = %(Device_id)s] [parameters: ({'last_runtime': 16.203381, 'Device_id': 141}, {'last_runtime': 16.100812, 'Device_id': 142}, {'last_runtime': 16.167533, 'Device_id': 143}, {'last_runtime': 16.269567, 'Device_id': 144}, {'last_runtime': 16.398398, 'Device_id': 145}, {'last_runtime': 16.662657, 'Device_id': 146}, {'last_runtime': 16.210814, 'Device_id': 147}, {'last_runtime': 16.141901, 'Device_id': 148}, {'last_runtime': 16.15689, 'Device_id': 149}, {'last_runtime': 16.127528, 'Device_id': 150})] (Background on this error at: http://sqlalche.me/e/e3q8)
It depends a lot on what you're doing and your environment, I can't fix it if I cannot reproduce it.
- What service are you using ? On what devices ? NetmikoBackupService on H3C's switch( netmiko driver:hp_comware.) An error occurred when setting one of switch to hp_procurve
- How many targets ? How many processes ? 12 switches & 50 processes
Can't reproduce it, but if you have only 12 switches, you don't need to enable multiprocessing
Got it, I will try to solve this bug.Thank you.
This can no longer happen in eNMS 3.15, services and workflows don't have a "status" anymore.
Service process's status stuck in "running" for some unknown reason. Is there a way to stop it?Is it possible to add a "stop" button to kill it?