EulerSearch / embedding_studio

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
https://embeddingstud.io/
Apache License 2.0
377 stars 5 forks source link

Encountering ClientError trying to use my own dataset #1

Closed ArthurCverianov closed 8 months ago

ArthurCverianov commented 9 months ago

I'm trying to use my dataset for model training, but I'm encountering the following error:

ClientError                               Traceback (most recent call last)
/tmp/ipykernel_199154/1286885123.py in <module>
----> 1 response = s3_client.get_object(Bucket='embedding-studio-experiments', Key='remote-lanscapes/clickstream/f6816566-cac3-46ac-b5e4-0d5b76757c93/sessions.json')
~/anaconda3/lib/python3.9/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    528                 )
    529             # The "self" in this scope is referring to the BaseClient.
--> 530             return self._make_api_call(operation_name, kwargs)
    531 
    532         _api_call.__name__ = str(py_operation_name)
~/anaconda3/lib/python3.9/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    958             error_code = parsed_response.get("Error", {}).get("Code")
    959             error_class = self.exceptions.from_code(error_code)
--> 960             raise error_class(parsed_response, operation_name)
    961         else:
    962             return parsed_response
ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

I attempted to set my AWS Access Key ID in the .env file under the variables MINIO_ACCESS_KEY and MINIO_SECRET_KEY. However, as I understand it, these variables are used for artifact storage, not for my datasets. Can you advise on how I can resolve this error?

oYASo commented 9 months ago

In the .env file, there are indeed no AWS parameters. The access should be set in your plugin, which is then passed to AWSS3DataLoader. For instance, it could look something like this:

        creds = {
            "role_arn": "arn:aws:iam::123456789012:role/some_data"
            "aws_access_key_id": "TESTACCESSKEIDTEST11",
            "aws_secret_access_key": "QWERTY1232qdsadfasfg5349BBdf30ekp23odk03",
        }
        self.data_loader = AWSS3DataLoader(**creds)

Each individual plugin may use its specific credentials.

ArthurCverianov commented 9 months ago

Great! it's working now. Thanks!