dhmlops / mlops

1 stars 0 forks source link

ClearML. Low or insufficient disk space #12

Closed jxk20 closed 2 years ago

jxk20 commented 2 years ago

One of the interns encountered the following error when he sent a training job to ClearML:

2021-11-08 14:01:40,289 - clearml.log - WARNING - failed logging task to backend (2 lines, <500/4: events.add_batch/v1.0 (Critical server error! server reports low or insufficient disk space. please resolve immediately by allocating additional disk space or freeing up storage space. (metrics, logs and all indexed data is in read-only mode!): reason=blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];)>)
Traceback (most recent call last):
  File "inference.py", line 56, in <module>
    task.execute_remotely(queue_name="queue-4xV100-128ram", exit_process=True)
  File "C:\Users\DSTA\AppData\Local\Programs\Python\Python37\lib\site-packages\clearml\task.py", line 1942, in execute_remotely
    Task.enqueue(task, queue_name=queue_name)
  File "C:\Users\DSTA\AppData\Local\Programs\Python\Python37\lib\site-packages\clearml\task.py", line 989, in enqueue
    res = cls._send(session=session, req=req)
  File "C:\Users\DSTA\AppData\Local\Programs\Python\Python37\lib\site-packages\clearml\backend_interface\base.py", line 89, in _send
    raise SendError(res, error_msg)
clearml.backend_interface.session.SendError: Action failed <500/4: tasks.enqueue/v1.0 (Critical server error! server reports low or insufficient disk space. please resolve immediately by allocating additional disk space or freeing up storage space. (metrics, logs and all indexed data is in read-only mode!): reason=blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];)> (queue=979aafb445e640b8b1b79e5e8cf2604e, task=28181aa17a3a4770b7fb9acec2a378ad)

It seems like there is currently insufficient disk space in the server? This error appears when the job was sent to the following GPUS:

2xV100-128ram
queue-2xV100-128ram
queue-2xV100-64ram
queue-1xV100-64ram
queue-4xV100-64ram
jxk20 commented 2 years ago

Issue resolved

jax79sg commented 2 years ago

What did you do to resolve this?