goodatlas / zeroth

Kaldi-based Korean ASR (한국어 음성인식) open-source project
Apache License 2.0
348 stars 124 forks source link

95.7 hours zeroth use aws s3 to download, but An error occurred (403) when calling the HeadObject operation: Forbidden #17

Open whaozl opened 3 years ago

whaozl commented 3 years ago

I use my account to aws s3 cp s3://zeroth-opensource/AUDIO_INFO AUDIO_INFO. But have as follow error:

Traceback (most recent call last):
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/s3handler.py", line 173, in call
    for fileinfo in fileinfos:
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/fileinfobuilder.py", line 31, in call
    for file_base in files:
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/filegenerator.py", line 142, in call
    for src_path, extra_information in file_iterator:
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/filegenerator.py", line 318, in list_objects
    yield self._list_single_object(s3_path)
  File "/home/kaldi/python3/lib/python3.7/site-packages/awscli/customizations/s3/filegenerator.py", line 355, in _list_single_object
    response = self._client.head_object(**params)
  File "/home/kaldi/python3/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/kaldi/python3/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2021-07-22 16:40:56,020 - Thread-1 - awscli.customizations.s3.results - DEBUG - Shutdown request received in result processing thread, shutting down result thread.
Download from AWS is failed, check your credential and configure your aws CLI

Can you help me?

AUDIOINFO='AUDIO_INFO'
AUDIOLIST=$2
bucketname="zeroth-opensource"
# download audio info file
if [ ! -f $data/$AUDIOINFO ]; then
    aws s3 cp s3://$bucketname/$AUDIOINFO $data/$AUDIOINFO
    success=$(echo $?)
    if [ $success -ne 0 ]; then
        echo "Download from AWS is failed, check your credential and configure your aws CLI"
        exit 1
    fi
fi

# download Audio
echo "Now download Audio ----------------------------------------------------"
for file in $AUDIOLIST
do
    echo "check if $file.tar.gz exist or not"
    if [ ! -f $data/$file.tar.gz ]; then
        aws s3 cp s3://$bucketname/$file.tar.gz $data/$file.tar.gz
    else
        echo "  $data/$file.tar.gz already exist"
    fi
done
jty016 commented 3 years ago

@whaozl 95.7 hour data is not opened in public. For now public data is in http://www.openslr.org/40/, 50 hour data. Maybe I can consider it to be opened soon.

mrrostam commented 2 years ago

Hey @jty016, I wonder if the more extensive dataset containing 95 hours of data is now opened to the public, or will it be in the near future? Interestingly it seems all significant Korean speech corpora are private or at least have some unreasonable restrictions, like KoSpeech.