apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.24k stars 2.17k forks source link

Disaster Recovery Options for AWS Athena/Iceberg Integration #6619

Open anthonysgro opened 1 year ago

anthonysgro commented 1 year ago

Query engine

AWS Athena

Question

Right now, I have an Iceberg table set up through AWS Athena. I have backed up my data through AWS Backup, which basically restores the S3 buckets if for some reason the S3 buckets are deleted. I have tested this, and after restoring, Athena still lets me query the restored data, which is perfect.

If I run a DROP TABLE command, Athena loses reference to the table and the S3 data is deleted. I thought "okay, no problem, we can just recreate the table and then instantiate our backup". But this does not work. For some reason, Athena loses reference to the table even if I point it to the same exact location where it used to point. Does anyone have a solution for this? Right now, I am exposed to an erroneous DROP TABLE command by a developer/admin.

I have not been able to find any information on this.

amogh-jahagirdar commented 1 year ago

Thanks for creating this issue, @anthonysgro could you provide more details on how you're recreating the table and pointing the location and how AWS Backup fits in?

As far as my understanding goes, for disaster recovery for Iceberg tables generally it's recommended to use S3 multi-region access points . You can check out https://github.com/apache/iceberg/issues/5779 this discussion for more details.

Some docs which explain:

https://aws.amazon.com/s3/features/multi-region-access-points/ https://iceberg.apache.org/docs/latest/aws/#s3-access-points

If you desire S3 access point integration in Athena specifically, feel free to ping me on Slack!

anthonysgro commented 1 year ago

Yes. So here is specifically how it happens:

Creating my table: I create my table through an Athena query

CREATE TABLE IF NOT EXISTS db.friends (
    id string,
    name string,
    birthday timestamp
)
LOCATION s3://iceberg-datalake
TBLPROPERTIES (
   'table_type' = 'ICEBERG'
)

We can insert a couple rows into this table, yada yada.

Then comes AWS Backup. It is a service that is natively integrated into S3. I went to the AWS Backup console, navigated to Protected Resources, and picked my iceberg datalake bucket, and created a recovery point. It takes about an hour or so for it to get backed up.

Once a recovery point becomes available, I deleted my entire s3://iceberg-datalake bucket. If I query my iceberg table through Athena, it throws an error (as expected, the data is deleted!). But no worries, I restored the bucket with a click of a button in AWS Backup, and voila, it recreates after a bit of time. I can once again query my iceberg table, as if nothing happened. Perfect.

However, some cases where this doesn't work: If I run:

DROP TABLE db.friends

it deletes the table and the S3 data, as expected. AWS tells me iceberg tables do that, for some reason. But then, if I recreate the table using the above command, and then restore my backup, it throws a strange error, and it can't seem to get back to normal.

Is there something with a metadata pointer or something if this occurs? I haven't been able to find a way to restore iceberg table data after a DROP TABLE command. I would absolutely be interested in hearing more if this S3 access point integration helps prevent agains this!

amogh-jahagirdar commented 1 year ago

Is there something with a metadata pointer or something if this occurs?

Yes, so Athena's integration with Iceberg uses the Glue Catalog. Every write done by Athena on the Iceberg table (in this case your inserts) is optimistically writing out the metadata file and then updating the metadata_location property on the Glue table as part of the commit.

When you drop an Iceberg table in Athena as you observed, Athena does delete the S3 files and drop the glue table (meaning now the pointer to the metadata location is lost). When you re-create the table after restoring the backup, a new glue table is created with a new metadata pointer (effectively an empty table state).

So after re-creating the table and restoring your data what you can do is update the metadata_location table property in Glue to point to the latest metadata location prior to dropping the table. After that you should be able to read the data as before. I was able to repro this process in my account, but please let me know if you run into any issues!

a-agmon commented 7 months ago

@amogh-jahagirdar - I just wanted to say that we had a very problematic recovery case today, and your comment saved the day. We simply changed the metadata_location in glue and were able to recover :-)

SamRaza356 commented 5 months ago

There is another issue the files present in metadata(.json & .avro) files are having original path locations that causes an issue if original table is dropped. So what we can do in that case, I think reading (.avro) files and changing path locations how we can do that?

jessedobbelaere commented 1 day ago

The recovery procedures mentioned here, should work if you:

  1. Ideally stop all DML queries (streaming ingestion) first
  2. Restoring the S3 bucket data back to the original location with AWS Backup by overwriting.
  3. Running a new CREATE TABLE statement with the exact original schema, and set the location property pointing to the right S3 path.
  4. Modifying the Glue Catalog "table properties" to set metadata_location and perhaps previous_metadata_location. The issue here is that (definitely with continous ingestion) you don't really know what metadata file was used at the time of backup ⚠️ So you need to make a guess based on Last Modified Data? Or make a lambda that writes the table properties to S3 every time it changes...

In case the table was not dropped, but the data is corrupted by a bad UPDATE or certain rows got deleted, you can use Time-Travel in a MERGE query... As long as you have snapshots going back far enough though. You need a good balance between storage/cost and the vacuum_max_snapshot_age_seconds property.

Ideally I want to restore the Iceberg data into a new table (unless the original was dropped). Then we can run a MERGE query and update/insert/delete specific rows based on the situation. Because if the original Iceberg table still exists, the ingestion flow or read queries are not interrupted by the recovery. However, a partial restore seems even more tricky: you restore the S3 data to a different bucket location, you have to rewrite JSON (and avro?) files in the metadata/ location of the Iceberg table because they contain absolute S3 paths from the old location, you have to point the Glue metadata_location to the new s3 path, etc. The Iceberg S3 access points could help but is this supported by Glue & Athena?

I suppose for this partial recovery use-case, it's easier to just unload the latest Iceberg rows to simple parquet files, backup those files, restore by running a create new table from parquet and use it to run MERGE queries? Thoughts or advice how to improve this process?