Open jayanthshimoga opened 5 months ago
🤖
Hey there, @jayanthshimoga! Great to see you back. Hope you're doing well!
Based on the information you've provided, it seems like the issue might be related to the differences in the environment between the Docker image and the Flyte cluster. Here are a few potential causes:
Network Configuration: The Docker image and the Flyte cluster might have different network configurations. This could cause the Docker image to be unable to connect to the Flyte cluster if the network settings are not properly configured.
Environment Variables: The Docker image and the Flyte cluster might have different environment variables set. This could cause the Docker image to be unable to connect to the Flyte cluster if the necessary environment variables are not set correctly.
Software Versions: The Docker image and the Flyte cluster might be running different versions of software. This could cause the Docker image to be unable to connect to the Flyte cluster if the software versions are not compatible.
Access Control: The Docker image and the Flyte cluster might have different access control settings. This could cause the Docker image to be unable to connect to the Flyte cluster if the Docker image does not have the necessary permissions.
In the provided context, the 'connection' argument error could be due to incorrect configuration of the storage connection in the 'flyte-admin-config' ConfigMap. The connection details for the 'minio' storage are provided under the 'storage' key in the ConfigMap. If these details are not correctly set, it could lead to a 'connection' argument error.
Here is the relevant part of the configuration:
storage:
type: minio
connection:
access-key: minio
auth-type: accesskey
secret-key: miniostorage
disable-ssl: true
endpoint: "http://minio:9000"
region: my-region
Ensure that the 'access-key', 'secret-key', 'endpoint', and 'region' are correctly set according to your Minio storage configuration.
As for the changes in PyAthena version >3.0.10, I wasn't able to find specific changes that could cause the 'connection' argument error. It might be helpful to check the PyAthena's release notes or changelog for the versions after 3.0.10 to see if there were any changes that could potentially cause this issue.
If the issue persists, could you please provide more details about the error message? Specifically, the full traceback would be very helpful in diagnosing the problem. Also, could you please confirm if the same error occurs with other versions of PyAthena or only with versions greater than 3.0.10?
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Hi Team, Any update on this? it will be very helpful.
@jayanthshimoga are you able to share the minimal repro
Hi @pingsutw , Please find the repo link here https://github.com/jayanthshimoga/flytefit. Its a very small, quick to setup and easy to recreate the issue.
cc @pingsutw / @eapolinario has anyone looked into this?
Describe the bug
Circular Dependency issue in Flyte.
We are getting the below error when we are using pandascursor to get the AWS Athena query result in the flyte workflow, which is only occurring when we use PyAthena >3.0.10. However below python code works fine with other application but doesn't work inside flyte.
Python code works fine inside docker image, when the same image is registered to flyte cluster the code is breaking.
__init__() got an unexpected keyword argument 'connection'
Working:
Not working:
Expected behavior
We are doing an athena query by connecting to AWS and we want the result in the pandas df. We are able to see query getting executing in Athena only issue is while reading it from S3.
Expected behaviour : Reading data from s3 and converting to pandas df. This issue is only occurring inside flyte cluster
Additional context to reproduce
Context: So we are doing an Athena query by connecting to AWS and we want the result in the pandas df. We are able to see query getting executing in Athena only issue is while reading it from S3. So I separately tried to call s3 bucket or run aws commands using python subprocess. I do have access and able fetch the s3 objects. But my assumption is when its trying to fetch from Pyathena connection is getting lost in flyte.
Flow:
I don't know why the flyte is unable to establish a connection to s3 when we call from Pyathena.
Screenshots
All the screenshot and sample code is available in https://github.com/jayanthshimoga/flytefit Please use your own AWS access key, Secret key and S3 path.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?