DougTrajano / mlflow-server

MLflow Tracking Server with basic auth deployed in AWS App Runner.
https://gallery.ecr.aws/t9j8s4z8/mlflow
Apache License 2.0
34 stars 17 forks source link

Unable to list artifacts stored under <code>{artifactUri}</code> #149

Closed Heineb closed 2 years ago

Heineb commented 2 years ago

When listing artifacts I receive this error message:

Loading Artifacts Failed Unable to list artifacts stored under {artifactUri} for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory._

DougTrajano commented 2 years ago

Hello @Heineb could you share with me the variables used to create the Terraform stack? Maybe an existent VPC or S3 Bucket doesn't have the required permission.

Heineb commented 2 years ago

Hi Doug - Thanks for the swift response and a great repo.

I'm using your script and have only changed the login passwords and hardcoded the RDS password. The rest is left untouched meaning new VPCs and bucket have been created. I've validated the bucket and IAM permissions and they are set as in the script.

When I check cloud watch I get these error messages: [2022-05-18 07:54:30 +0000] [13] [CRITICAL] WORKER TIMEOUT (pid:28)

I tried to increase timeout by setting [GUNICORN_CMD_ARGS]="--timeout 120" - but no luck.

Bucket policy: { "Version": "2012-10-17", "Statement": [ { "Sid": "Statement1", "Principal": {}, "Effect": "Allow", "Action": [], "Resource": [] } ] }

IAM policy: { "Statement": [ { "Action": [ "s3:ListBucket", "s3:HeadBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::mlflow-dev-20220518065838693500000001" ] }, { "Action": [ "s3:ListBucketMultipartUploads", "s3:GetBucketTagging", "s3:GetObjectVersionTagging", "s3:ReplicateTags", "s3:PutObjectVersionTagging", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:GetObject", "s3:GetObjectAcl", "s3:GetObject", "s3:AbortMultipartUpload", "s3:PutBucketTagging", "s3:GetObjectVersionAcl", "s3:GetObjectTagging", "s3:PutObjectTagging", "s3:GetObjectVersion" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::mlflow-dev-20220518065838693500000001/*" ] } ], "Version": "2012-10-17" }

Only thing is I get an error in the policy like this: Ln 6, Col 16 Invalid Action: The action s3:HeadBucket does not exist. Did you mean s3:ListBucket? The API called HeadBucket authorizes against the IAM action s3:ListBucket._

DougTrajano commented 2 years ago

Hi @Heineb thank you! :)

Let's try to remove all these actions and add an S3FullAccess policy to the IAM role.

I also recommend you to check in AWS App Runner > Application logs, to get logs with more details.

Heineb commented 2 years ago

Hi Doug,

Thanks for your support - I tried the IAM policy simulator and got error messages referring to the accounts SCP permissions. So I'm conferring with our cloud team that handles account level settings.

DougTrajano commented 2 years ago

Hi Doug,

Thanks for your support - I tried the IAM policy simulator and got error messages referring to the accounts SCP permissions. So I'm conferring with our cloud team that handles account level settings.

@Heineb awesome buddy! Good luck and enjoy your MLflow server. :)

Heineb commented 2 years ago

Hi @DougTrajano ,

Sorry for reopening this issue - but it appears it wasn't an SCP issue. A tad more information - the frontend returns this error message: "status: 503, text: 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'" - Have you experienced this before?

DougTrajano commented 2 years ago

Hey buddy! Don't worry about reopening this issue.

Actually, I figured out what is the root cause and fixed it in the last commits.

Essentially, the VPC needs to have a VPC Endpoint configured to enable Amazon S3 endpoints access.

Please, try again with the new stack on the main branch and tell me if you have any issues.

Heineb commented 2 years ago

Hi - Thanks for the swift response. Great news. We'll try the new stack and let you know 👍