Closed vinnielhj closed 2 days ago
I see the same issue with CALL catalog_name.system.migrate(table => 'db.sample', drop_backup => true)
which is a shortcut of the above steps.
@aokolnychyi @RussellSpitzer Is this by design for this action?
If I remember correctly the issue here is we make a backup of the catalog entry, for managed tables this also copies files but for external tables it does not. Since it is not a backup of the table and is actually just a backup of the reference to the table dropping it can have deleterious effects in some cases if you drop with purge I'm guessing
But I am a bit confused which files are being dropped that are required? Shouldn't all the iceberg references be to the original location?
I'm testing with a managed table. The directories after migration are as follows
/user/hive/warehouse/db/sample/metadata
/user/hive/warehouse/db/sample_backup_
There's no copy of original data and if sample_backup_
is dropped, db.sample
can't be queried anymore.
Sorry, I haven't followed up on this question for a long time due to personal reasons.
I think the backup table can be designed as an external table, so that after the backup table is deleted, only the metadata will be deleted without deleting the data files, so that iceberg will not be affected.
In certain circumstances, it is necessary to delete the backup table. After the deletion is performed, the data file is also deleted, which makes the iceberg table unavailable.
Looking forward to your reply @aokolnychyi @RussellSpitzer @manuzhang
Our users would like to have Iceberg's table data separated from the backup data after migration. Hence, I modified the migrate
procedure such that when a location
for the Iceberg table is provided, data will be copied over.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
Apache Iceberg version
1.3.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
Environment: spark3.2.1,iceberg1.3.1,org.apache.iceberg.spark.SparkSessionCatalog
desc: I have a Hive table test.sample, I execute now CALL spark_catalog.system.migrate(table => 'test.sample') Migrate it as an Iceberg table. At this time, there are two directories test.db/sample/ and test.db/samplebackup on the file system, and show tables can also see that there are two tables test.sample and test.samplebackup. When I migrate and check the data is correct, I may need to delete the backup table test.samplebackup. When I execute DROP TABLE test.samplebackup, the table is deleted after the execution is completed, and at the same time the file system test.db/samplebackup directory will be deleted. When I query the iceberg table again, test.sample will throw an error that the file cannot be found, because the file cannot be indexed. I don't think this is reasonable, just removing the meta information and not the data files is a good way of doing things in my opinion.
Steps: