Backblaze / terraform-provider-b2

Terraform Provider for Backblaze B2 Cloud Storage
Other
41 stars 11 forks source link

Unable to configure "Keep only the last version of the file" lifecycle rule without magic #85

Open colans opened 1 month ago

colans commented 1 month ago

I'm trying to set this up like in the Web UI:

image

It seems like these settings are the most appropriate:

resource "b2_bucket" "my_bucket" {
  bucket_name = var.bucket_name
  bucket_type = "allPrivate"
  lifecycle_rules {
    file_name_prefix              = "" # Apply to all files
    days_from_hiding_to_deleting  = 0  # Immediately delete after a file is hidden
    days_from_uploading_to_hiding = 0  # Never keep revisions
  }
}

However, when I look at the Web front-end, I it's set to custom lifecycle rules, which is different:

image

So I tried setting it manually, deleting it from my state, and then importing to see what the correct settings would be:

    lifecycle_rules {
        days_from_hiding_to_deleting  = 1
        days_from_uploading_to_hiding = 0
        file_name_prefix              = null
    }

But then when I try to apply that, I get:

│ Error: Missing required argument
│ 
│   with module.backupscale_backup.b2_bucket.backupscale_bucket,
│   on modules/backupscale_backup/object_storage.tf line 7, in resource "b2_bucket" "backupscale_bucket":
│    7:     file_name_prefix              = null
│ 
│ The argument "lifecycle_rules.0.file_name_prefix" is required, but no definition was found.

So then I tried this:

  lifecycle_rules {
    days_from_hiding_to_deleting  = 1
    days_from_uploading_to_hiding = 0
    file_name_prefix              = ""
  }

This seems to work, as in it shows up correctly in the Web UI, but how does hiding the file for one day actually mean that files shouldn't be kept? This seems like some sort of secret magic incantation that shouldn't work, while it doesn't work the way it should. Said another way, it's extremely counter-intuitive.

Questions:

  1. Is it actually possible to set this intuitively from Terraform to achieve the desired behaviour?
  2. Will my above configuration be good enough for now, or will this keep revisions? Maybe the Web UI is misleading, even though it looks okay?
  3. Is https://github.com/Backblaze/b2-sdk-python/issues/188 the solution to this? It's not clear to me if that's a solution to this problem, or something else.

There's another issue about this at https://github.com/Backblaze/terraform-provider-b2/issues/46, but that's just a documentation issue requiring setting file_name_prefix = "".

ppolewicz commented 4 weeks ago

I'm not sure what "Keep only the last version of the file" in the UI translates to on the backend - if you set that and dump the current settings using b2 cli, you might find out how to set it up so that the UI displays it the same way,

metadaddy commented 4 weeks ago

@ppolewicz

I'm not sure what "Keep only the last version of the file" in the UI translates to on the backend - if you set that and dump the current settings using b2 cli, you might find out how to set it up so that the UI displays it the same way,

You get this, which is what @colans posted above:

% b2 bucket get another-unique-bucket-name
{
    ...
    "lifecycleRules": [
        {
            "daysFromHidingToDeleting": 1,
            "daysFromUploadingToHiding": null,
            "fileNamePrefix": ""
        }
    ],
    ...
}

@colans

This seems to work, as in it shows up correctly in the Web UI, but how does hiding the file for one day actually mean that files shouldn't be kept? This seems like some sort of secret magic incantation that shouldn't work, while it doesn't work the way it should. Said another way, it's extremely counter-intuitive.

The key concepts here are:

Most tools that work with versioned buckets default to soft deletes, since you can undelete the files later. The lifecycle rules allow you to say "hard delete hidden files after so many days". With the B2 Native API, you call b2_hide_file; with the S3-compatible API, you call DeleteObject without the versionId parameter. In both cases, the file is 'hidden', and can be restored later.

Many tools allow you to also perform hard deletes. In this case, the file is immediately deleted, with no way to restore it later. With the B2 Native API, you call b2_delete_file_version; with the S3-compatible API, you call DeleteObject with the versionId parameter set to the specific file version you want to delete.

Now, if you always use hard deletes, lifecycle rules are irrelevant. There are never any hidden files to deal with.

If you use soft deletes, you can use lifecycle rules to configure how long hidden files are retained before they are hard deleted. The reason that the minimum period for days_from_hiding_to_deleting is 1, rather than 0, is that lifecycle rules are evaluated once a day, so you will always have some period of time between hiding and deleting.

The doc page on File Versions has a more detailed explanation, if you're interested.

It seems like these settings are the most appropriate:

...
    days_from_hiding_to_deleting  = 0  # Immediately delete after a file is hidden
...

Since there is no way to immediately delete a file after it is hidden, you might want to look at whether you should hard delete files.

If you are using an off-the-shelf product that cannot be configured to hard delete files, you could look at using Event Notifications to achieve your goal. You could configure an event notification rule to send you a notification on the b2:HideMarkerCreated:* event type, and hard delete the file in the notification handler. There is a trade-off here, though - you must deploy the notification handler at an internet-accessible URL.

Answering your list of questions:

  1. When you select the "Keep only the last version of the file" button, the web UI is making an API call to set the lifecycle rules as defined above. This is defined in the documentation for lifecycle rules.
  2. I don't think your configuration is correct. You shouldn't have 0 for days_from_uploading_to_hiding (as also mentioned in the docs) - it should be null, or just omit it, which amounts to the same thing.
  3. I think that issue is orthogonal to this one. If you set the lifecycle rules as above, it should work.

Hopefully this explains the magic. Feel free to post any more questions here, or close the issue, as you see fit.

colans commented 3 weeks ago

Thanks for the background! Certainly sheds some insight.

You shouldn't have 0 for days_from_uploading_to_hiding (as also mentioned in the docs) - it should be null, or just omit it, which amounts to the same thing.

That's strange, because again, that's what came back from terraform import. Could that be a bug?

However, I am using hard deletes, as that's what's best for Restic repositories. But I was hoping to do it both ways, just in case.

metadaddy commented 3 weeks ago

Thanks for the background! Certainly sheds some insight.

You're most welcome!

You shouldn't have 0 for days_from_uploading_to_hiding (as also mentioned in the docs) - it should be null, or just omit it, which amounts to the same thing.

That's strange, because again, that's what came back from terraform import. Could that be a bug?

I think it is. I think that terraform import should either not include days_from_uploading_to_hiding or set it to null.

@mlech-reef / @emnoor-reef - should I change the title of this issue to something like "terraform import should not set days_from_uploading_to_hiding to 0", or create a new issue?

mlech-reef commented 3 weeks ago

@metadaddy please create a new issue. I will take a look later this week.