Jakobovski / free-spoken-digit-dataset

A free audio dataset of spoken digits. An audio version of MNIST.
626 stars 248 forks source link

DatasetCorruptError: The HEAD node of the branch main of this dataset is in a corrupted state and is likely not recoverable. #48

Closed David-GERARD closed 7 months ago

David-GERARD commented 7 months ago

Hi,

I have been playing around with the dataset through the hub API.

I just started to get the following error message:

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/deeplake/core/storage/s3.py](https://8u0mko7bx9-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240430-060123_RC00_629368006#) in get_bytes(self, path, start_byte, end_byte)
    274         try:
--> 275             return self._get_bytes(path, start_byte, end_byte)
    276         except botocore.exceptions.ClientError as err:

20 frames
ClientError: An error occurred (InternalError) when calling the GetObject operation (reached max retries: 4): We encountered an internal error.  Please retry the operation again later.

During handling of the above exception, another exception occurred:

ClientError                               Traceback (most recent call last)
ClientError: An error occurred (InternalError) when calling the GetObject operation (reached max retries: 4): We encountered an internal error.  Please retry the operation again later.

During handling of the above exception, another exception occurred:

S3GetError                                Traceback (most recent call last)
S3GetError: An error occurred (InternalError) when calling the GetObject operation (reached max retries: 4): We encountered an internal error.  Please retry the operation again later.

The above exception was the direct cause of the following exception:

DatasetCorruptError                       Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/deeplake/api/dataset.py](https://8u0mko7bx9-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240430-060123_RC00_629368006#) in load(path, read_only, memory_cache_size, local_cache_size, creds, token, org_id, verbose, access_method, unlink, reset, indra, check_integrity, lock_timeout, lock_enabled, index_params)
    714                 if not reset:
    715                     if isinstance(e, DatasetCorruptError):
--> 716                         raise DatasetCorruptError(
    717                             message=e.message,
    718                             action="Try using `reset=True` to reset HEAD changes and load the previous commit.",

DatasetCorruptError: The HEAD node of the branch main of this dataset is in a corrupted state and is likely not recoverable. Try using `reset=True` to reset HEAD changes and load the previous commit.

This is the code I use:

import hub
ds = hub.load("hub://activeloop/spoken_mnist")

I tried both on my machine and using colab to check the issue wasn't on my side, but I'm getting the same on both.

Please help!

Best, David

David-GERARD commented 7 months ago

Hi, it seems the problem has been fixed as I can load the dataset again. I am keeping the issue open to let the contributors know about what happened.

Jakobovski commented 7 months ago

It was probably an issue with the remote server. I will close this issue as it is unlikely to re-occur