datarevenue-berlin / OpenMLOps

MIT License
703 stars 101 forks source link

Error creating S3 bucket: BucketAlreadyOwnedByYou #70

Closed NhatAnh closed 3 years ago

NhatAnh commented 3 years ago

When I try to follow the instruction at https://github.com/datarevenue-berlin/OpenMLOps/blob/master/tutorials/set-up-open-source-production-mlops-architecture-aws.md

I got to the step running: terraform apply -var-file=my_vars.tfvars

But I got: Error creating S3 bucket: BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it. │ │ with aws_s3_bucket.mlflow_artifact_root, │ on main.tf line 11, in resource "aws_s3_bucket" "mlflow_artifact_root": │ 11: resource "aws_s3_bucket" "mlflow_artifact_root"

If I delete the S3 bucket and rerun the command, I got: Error loading state: S3 bucket does not exist.

So maybe it was using that bucket, but then later try to create the bucket again?

sixhobbits commented 3 years ago

The first one is expected if you already have created that bucket. The second one sounds like it is trying to load the Terraform state from the S3 bucket that you have now deleted.

It's useful to save Terraform state in S3 if you are working in a team environment, but it's simpler to just save terraform state locally. You can toggle it in https://github.com/datarevenue-berlin/OpenMLOps/blob/master/terraform_backend.tf and see some general information there too.

NhatAnh commented 3 years ago

In the output, it says: Warning: Backend configuration ignored │ │ on ../OpenMLOps/terraform_backend.tf line 17, in terraform: │ 17: backend "s3" { │ │ Any selected backend applies to the entire configuration, so Terraform expects provider configurations only in the root module.

So I think it does not use the config in terraform backend.

bernardolk commented 3 years ago

@NhatAnh please move your backend file to your OpenMLOps-AWS repo and do, from there, a terraform init --backend-config=<yourbackendfile> --reconfigure then try again applying.

NhatAnh commented 3 years ago

I got error: │ Error: Duplicate backend configuration │ │ on terraform_backend.tf line 22, in terraform: │ 22: backend "local" {} │ │ A module may have only one backend configuration. The backend was previously configured at main.tf:2,3-15.

When I comment out the terraform block in main.tf, then I got another error: │ Error: Unsupported block type │ │ on terraform_backend.tf line 16: │ 16: terraform { │ │ Blocks of type "terraform" are not expected here.

bernardolk commented 3 years ago

Please leave the terraform block in main.tf. The backend file actually is only required if you are going to use S3, and its not what is inside the terraform_backend.tf file ( I noticed how confusing it is right now, so I will request we remove that file). If you wish to use terraform with state saving on local, you can ignore that file, otherwise you need to fill it (or create some txt with) like this:

bucket = "aws-bucket-name"
key  = "filename-in-s3"
region = "awsregion"
bernardolk commented 3 years ago

You can check more info here: https://www.terraform.io/docs/language/settings/backends/configuration.html

The extra file might not even be necessary, you could set those keys directly into main.tf (or just set it to local there) and it should work.

NhatAnh commented 3 years ago

Hi, I got to the end of the tutorial. But the link https://mlops.pixtavietnam.com/ return 404. While the link https://jupyter.mlops.pixtavietnam.com/ return 401. How do I get the auth link https://mlops.pixtavietnam.com/auth/profile/registration to work? Thanks

bernardolk commented 3 years ago

So, your "homepage" won't work unless you have routing rules set for it that lead to a service returning a webpage. Now, the 401 you are getting in the second URL is normal, since you are not logged in yet. The third I think is wrong, try https://mlops.pixtavietnam.com/profile/auth/registration. If it doesn't work, then you will need to get a log from your oathkeeper pod, to see if the request is hitting

NhatAnh commented 3 years ago

Thank you, I got it almost 100% working. But I got prefect.exceptions.ClientError: Your Prefect Server instance has no tenants. Create a tenant withprefect server create-tenant` for pod prefect-server-agent. I think in my previous run, this pod ran fine. How do I fix this one?

bernardolk commented 3 years ago

Hmm, checkout this issue, there is some relevant info at the very bottom of the discussion: https://github.com/datarevenue-berlin/OpenMLOps/issues/67

I hope it helps, but if it doesn't, please tell us :)

NhatAnh commented 3 years ago

It seems to have fixed itself overnight :) Thank you a lot for your help!

Just one more question: If I don't need to use the cluster immediately, should I 'terraform destroy' to save money and create cluster again later. Or what can I do to save money and resources? Like temporarily stop pods/nodes or something?

bernardolk commented 3 years ago

I would say the safest is to destroy, but you can try scaling everything to 0 and then monitor your costs in AWS console to see if they are going up. Would be nice to know if that's an option but I believe they will still charge you :x