archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: AM1.16 MCPClient tries to refresh prometheus tmp database when prometheus is not enabled (and mcpclient crashed) #1711

Open mamedin opened 2 hours ago

mamedin commented 2 hours ago

Expected behaviour

Current behaviour

Sometimes MCPclient tries to refresh the prometheus database (I suppose when worker pool member is restarted because reaching the limit (new AM1.16 feature: pool of workers))

It is happening at the transfer start. And it is not happening in all transfers.

MCPServer log error:

DEBUG     2024-10-16 13:00:07  archivematica.mcp.server.packages:packages:_copy_from_transfer_sources:243:  source: 6ff352d0-f495-4330-920a-dcbe20a817b5.tar.gz, destination: /var/archivematica/sharedDirectory/tmp/tmp1gkokc0b/6ff352d0-f495-4330-920a-dcbe20a817b5.tar.gz
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.packages:packages:_start_package_transfer_with_auto_approval:424:  Package cbf9eee3-b327-4bb9-a2a3-915438751536: moving package to processing directory
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.packages:packages:_start_package_transfer_with_auto_approval:429:  Package cbf9eee3-b327-4bb9-a2a3-915438751536: starting workflow processing
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.chain:chain:__init__:89:  Creating JobChain 167dc382-4ab1-4051-8e22-e7f1c1bf3e6f for package cbf9eee3-b327-4bb9-a2a3-915438751536 (initial link 3229e01f-adf3-4294-85f7-4acb01b3fbcf)
INFO      2024-10-16 13:00:11  archivematica.mcp.server.jobs.client:client:run:104:  Running Extract zipped bag transfer (package cbf9eee3-b327-4bb9-a2a3-915438751536)
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:submit:217:  Submitted gearman job d92fafcd-0b61-4ac1-93af-3e714db63a75 (extractzippedtransfer_v0.0)
ERROR     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:update_task_results:238:  Gearman task batch d92fafcd-0b61-4ac1-93af-3e714db63a75 failed to execute
INFO      2024-10-16 13:00:11  archivematica.mcp.server.jobs.client:client:run:104:  Running Email fail report (package cbf9eee3-b327-4bb9-a2a3-915438751536)
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:submit:217:  Submitted gearman job f19c544a-6b26-421c-bb3b-8552ec6c4cd6 (emailfailreport_v0.0)
ERROR     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:update_task_results:238:  Gearman task batch f19c544a-6b26-421c-bb3b-8552ec6c4cd6 failed to execute
INFO      2024-10-16 13:00:11  archivematica.mcp.server.jobs.client:client:run:104:  Running Move to the failed directory (package cbf9eee3-b327-4bb9-a2a3-915438751536)
DEBUG     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:submit:217:  Submitted gearman job 664f634a-9034-4a54-b85e-91438aae55d3 (movetransfer_v0.0)
ERROR     2024-10-16 13:00:11  archivematica.mcp.server.jobs.tasks:gearman_backend:update_task_results:238:  Gearman task batch 664f634a-9034-4a54-b85e-91438aae55d3 failed to execute

MCPCLient log in syslog:

Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: DEBUG     2024-10-16 13:00:11,421  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 
d92fafcd-0b61-4ac1-93af-3e714db63a75 received for extractzippedtransfer_v0.0
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: ERROR     2024-10-16 13:00:11,424  archivematica.mcp.client.gearman:on_job_exception:111:  An unhandled 
exception occurred processing a Gearman job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: Traceback (most recent call last):
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_ex
ecute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 203, i
n labels
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metrics[labelvalues] = self.__class__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 600, i
n __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    super().__init__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 151, i
n __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metric_init()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 628, i
n _metric_init
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._sum = values.ValueClass(self._type, self._name, self._name + '_sum', self._labelnames, self._labelvalues, self._documen
tation)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 68, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self.__reset()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 82, in __reset
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    files[file_prefix] = MmapedDict(filename)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 64, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._f = open(filename, 'rb' if read_mode else 'a+b')
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/prometheus-stats7rrndq4_/histogram_1782.db'
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: DEBUG     2024-10-16 13:00:11,461  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: ERROR     2024-10-16 13:00:11,462  archivematica.mcp.client.gearman:on_job_exception:111:  An unhandled exception occurred processing a Gearman job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: Traceback (most recent call last):
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_execute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 203, in labels
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metrics[labelvalues] = self.__class__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 600, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    super().__init__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 151, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metric_init()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 628, in _metric_init
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._sum = values.ValueClass(self._type, self._name, self._name + '_sum', self._labelnames, self._labelvalues, self._documentation)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 68, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self.__reset()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 82, in __reset
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    files[file_prefix] = MmapedDict(filename)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 64, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._f = open(filename, 'rb' if read_mode else 'a+b')
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/prometheus-stats7rrndq4_/histogram_1782.db'
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: DEBUG     2024-10-16 13:00:11,489  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 664f634a-9034-4a54-b85e-91438aae55d3 received for movetransfer_v0.0
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: archivematicaClient.py: ERROR     2024-10-16 13:00:11,490  archivematica.mcp.client.gearman:on_job_exception:111:  An unhandled exception occurred processing a Gearman job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: Traceback (most recent call last):
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_execute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
6 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/gearman/worker.py", line 218, in on_job_execute
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    job_result = function_callback(self, current_job)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/opt/archivematica/archivematica/src/MCPClient/lib/client/gearman.py", line 101, in handle_job
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    with metrics.task_execution_time_histogram.labels(script_name=task_name).time():
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 203, in labels
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metrics[labelvalues] = self.__class__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 600, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    super().__init__(
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 151, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._metric_init()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/metrics.py", line 628, in _metric_init
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._sum = values.ValueClass(self._type, self._name, self._name + '_sum', self._labelnames, self._labelvalues, self._documentation)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 68, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self.__reset()
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/values.py", line 82, in __reset
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    files[file_prefix] = MmapedDict(filename)
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 64, in __init__
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]:    self._f = open(filename, 'rb' if read_mode else 'a+b')
Oct 16 13:00:11 RACAM-RH-DEVMPL python[1782]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/prometheus-stats7rrndq4_/histogram_1782.db'

Steps to reproduce

It isn't easy to reproduce, it only happened 1 time on each 2 system pipelines. So I suspect it only happens when restarting a worker.

Your environment (version of Archivematica, operating system, other relevant details)

Rocky 9, AM1.16

SS separated VM 2 pipelines

The issue happened in both pipelines (Identical config)


For Artefactual use:

Before you close this issue, you must check off the following:

mamedin commented 2 hours ago

As workaround I added prometheus monitoring to MCPClients in localhost:

ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_PROMETHEUS_BIND_PORT=7998
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_PROMETHEUS_BIND_ADDRESS=127.0.0.1
mamedin commented 2 hours ago

I suspect we are missing a conditional in some places like this:

https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/client/pool.py#L158

We should check the monitoring is enabled before running metrics functions