allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.7k stars 657 forks source link

ValueError: Insufficient permissions (delete failed) for s3://clearml #1342

Open YasinFu opened 1 month ago

YasinFu commented 1 month ago

I want to use MinIO, which I set up on my own server, for storage. During the connection process, I encountered an error: ClearML Task: overwriting (reusing) task id=b5dae35c2fdd49cab0f8b7647b25bde6 2024-10-22 09:16:27,014 - clearml.Task - INFO - No repository found, storing script code instead 2024-10-22 09:16:31,367 - clearml.storage - ERROR - Failed uploading: Connection was closed before we received a valid response from endpoint URL: "https://clearml.s3.ap-south-1.amazonaws.com/.clearml.04bec773-d736-4ba8-a523-1d15a71be074.test". Traceback (most recent call last): File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn conn.connect() File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connection.py", line 411, in connect self.sock = ssl_wrapsocket( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/util/ssl.py", line 428, in ssl_wrap_socket ssl_sock = _ssl_wrap_socketimpl( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/util/ssl.py", line 472, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "/data/anaconda3/lib/python3.8/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/data/anaconda3/lib/python3.8/ssl.py", line 1040, in _create self.do_handshake() File "/data/anaconda3/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/anaconda3/lib/python3.8/site-packages/botocore/httpsession.py", line 464, in send urllib_response = conn.urlopen( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen retries = retries.increment( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/util/retry.py", line 507, in increment raise six.reraise(type(error), error, _stacktrace) File "/data/anaconda3/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise raise value.with_traceback(tb) File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn conn.connect() File "/data/anaconda3/lib/python3.8/site-packages/urllib3/connection.py", line 411, in connect self.sock = ssl_wrapsocket( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/util/ssl.py", line 428, in ssl_wrap_socket ssl_sock = _ssl_wrap_socketimpl( File "/data/anaconda3/lib/python3.8/site-packages/urllib3/util/ssl.py", line 472, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "/data/anaconda3/lib/python3.8/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/data/anaconda3/lib/python3.8/ssl.py", line 1040, in _create self.do_handshake() File "/data/anaconda3/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/anaconda3/lib/python3.8/site-packages/clearml/storage/helper.py", line 2817, in check_write_permissions self.delete(path=dest_path) File "/data/anaconda3/lib/python3.8/site-packages/clearml/storage/helper.py", line 2802, in delete return self._driver.delete_object(self.get_object(path)) File "/data/anaconda3/lib/python3.8/site-packages/clearml/storage/helper.py", line 628, in delete_object object.delete() File "/data/anaconda3/lib/python3.8/site-packages/boto3/resources/factory.py", line 581, in do_action response = action(self, *args, *kwargs) File "/data/anaconda3/lib/python3.8/site-packages/boto3/resources/action.py", line 88, in call response = getattr(parent.meta.client, operation_name)(args, params) File "/data/anaconda3/lib/python3.8/site-packages/botocore/client.py", line 569, in _api_call return self._make_api_call(operation_name, kwargs) File "/data/anaconda3/lib/python3.8/site-packages/botocore/client.py", line 1005, in _make_api_call http, parsed_response = self._make_request( File "/data/anaconda3/lib/python3.8/site-packages/botocore/client.py", line 1029, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/data/anaconda3/lib/python3.8/site-packages/botocore/endpoint.py", line 119, in make_request return self._send_request(request_dict, operation_model) File "/data/anaconda3/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _send_request while self._needs_retry( File "/data/anaconda3/lib/python3.8/site-packages/botocore/endpoint.py", line 360, in _needs_retry responses = self._event_emitter.emit( File "/data/anaconda3/lib/python3.8/site-packages/botocore/hooks.py", line 412, in emit return self._emitter.emit(aliased_event_name, kwargs) File "/data/anaconda3/lib/python3.8/site-packages/botocore/hooks.py", line 256, in emit return self._emit(event_name, kwargs) File "/data/anaconda3/lib/python3.8/site-packages/botocore/hooks.py", line 239, in _emit response = handler(kwargs) File "/data/anaconda3/lib/python3.8/site-packages/botocore/retryhandler.py", line 207, in call if self._checker(checker_kwargs): File "/data/anaconda3/lib/python3.8/site-packages/botocore/retryhandler.py", line 284, in call should_retry = self._should_retry( File "/data/anaconda3/lib/python3.8/site-packages/botocore/retryhandler.py", line 320, in _should_retry return self._checker(attempt_number, response, caught_exception) File "/data/anaconda3/lib/python3.8/site-packages/botocore/retryhandler.py", line 363, in call checker_response = checker( File "/data/anaconda3/lib/python3.8/site-packages/botocore/retryhandler.py", line 247, in call return self._check_caught_exception( File "/data/anaconda3/lib/python3.8/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception raise caught_exception File "/data/anaconda3/lib/python3.8/site-packages/botocore/endpoint.py", line 279, in _do_get_response http_response = self._send(request) File "/data/anaconda3/lib/python3.8/site-packages/botocore/endpoint.py", line 383, in _send return self.http_session.send(request) File "/data/anaconda3/lib/python3.8/site-packages/botocore/httpsession.py", line 503, in send raise ConnectionClosedError( botocore.exceptions.ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "https://clearml.s3.ap-south-1.amazonaws.com/.clearml.04bec773-d736-4ba8-a523-1d15a71be074.test".

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "updatemodel.py", line 4, in task = Task.init( File "/data/anaconda3/lib/python3.8/site-packages/clearml/task.py", line 631, in init task.output_uri = task.get_project_object().default_output_destination File "/data/anaconda3/lib/python3.8/site-packages/clearml/task.py", line 1254, in output_uri helper.check_write_permissions(value) File "/data/anaconda3/lib/python3.8/site-packages/clearml/storage/helper.py", line 2819, in check_write_permissions raise ValueError("Insufficient permissions (delete failed) for {}".format(base_url)) ValueError: Insufficient permissions (delete failed) for s3://clearml ^[[A^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/data/anaconda3/lib/python3.8/site-packages/clearml/backend_interface/metrics/reporter.py", line 317, in _handle_program_exit self.wait_for_events() File "/data/anaconda3/lib/python3.8/site-packages/clearml/backend_interface/metrics/reporter.py", line 337, in wait_for_events return report_service.wait_for_events(timeout=timeout) File "/data/anaconda3/lib/python3.8/site-packages/clearml/backend_interface/metrics/reporter.py", line 117, in wait_for_events if self._empty_state_event.wait(timeout=1.0): File "/data/anaconda3/lib/python3.8/site-packages/clearml/utilities/process/mp.py", line 165, in wait return self._sync.wait(*args, **kwargs) File "/data/anaconda3/lib/python3.8/threading.py", line 558, in wait signaled = self._cond.wait(timeout) File "/data/anaconda3/lib/python3.8/threading.py", line 306, in wait gotit = waiter.acquire(True, timeout)

Additionally, I don't understand why, since I'm only using MinIO, which is S3-compatible, it tries to connect to https://clearml.s3.ap-south-1.amazonaws.com/.

jkhenning commented 1 month ago

Hi @YasinFu, how did you set up your s3 URL for this service? Note you need to use a port to indicate non-AWS endpoints (see here)