Open davidbruce opened 2 years ago
@davidbruce , can you show me your Replibyte config .yaml, please? Can you also confirm that you are using the latest version of Replibyte?
@davidbruce, @evoxmusic, I am having the same problem when dumping some tables that are over 100MB.
I think this could be related to the order in which the dumped chunks are being restored.
To keep parameters more flexible and as a workaround I added a PR to make it possible to set the buffer size: https://github.com/Qovery/Replibyte/pull/234.
When restoring a 160MB table with buffer_size = 100MB
I got a inverted order when restoring the dump:
2.dump
was restored before 1.dump
and the resulting file started with a bunch of insert statement with high id
numbers.
When restoring the same table using buffer_size = 200MB
I only got 1.dump
chunk generated and the resulting file has all the correct info in the correct order.
I still don't know 100% how this chunk order is being messed up when restoring but if you can share a little more info on that I can try to help out with a solution 😃
From what you describe, I feel that the issue could come from this function.
@gugacavalieri can you show me your replibyte config.yaml
and can you also show me the content of your metadata.json
at the root of your backup bucket if you are using S3? I suspect an issue there.
@evoxmusic , looks like it is only happening when I run with local_disk
storage. Here is my configuration:
source:
connection_uri: $SOURCE_DATABASE_URL
transformers: []
only_tables: # optional - dumps only specified tables.
- database: staging
table: added_stickers
datastore:
local_disk:
dir: ./storage
destination:
connection_uri: $DESTINATION_DATABASE_URL
When running with buffer_size = 100
it creates the following metadata and chunks:
{"v":"0.10.0","dumps":[{"directory_name":"dump-1667763130600","size":10603029,"created_at":1667763242581,"compressed":true,"encrypted":false}]}
When restoring to local file (restore local -i mysql -v latest -o > output.sql
), the first line of the file shows:
INSERT INTO `added_stickers` (`id`, ...) VALUES (511158, ...);
So ID count is at 511158
. And If I scroll to line 904,501 / 2,428,792 I can see an insert with the last ID followed by the end of the dump file and then the beginning of the MySQL dump file:
INSERT INTO `added_stickers` (`id`, ...) VALUES (813513, ...);
/*!40000 ALTER TABLE `added_stickers` ENABLE KEYS */;
UNLOCK TABLES;
SET @@SESSION.SQL_LOG_BIN = @MYSQLDUMP_TEMP_LOG_BIN;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
-- Dump completed on 2022-11-06 12:32:44
-- MySQL dump 10.13 Distrib 8.0.31, for macos12.6 (arm64)
--
-- Host: xxx Database: staging
-- ------------------------------------------------------
-- Server version 5.7.38-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
If I set buffer_size = 200
the problem does not happen. Also, I tested S3 with buffer_size = 100
and the chuncks were restored in the correct order so I guess this could be a problem with the local_disk
storage.
Probably this read_dir
function is returning the files in the wrong order:
Yes, a sort is necessary. Otherwise, it can not work. We can't rely on read_dir(..)
that depends on the filesystem
Would something like this do the trick?
let dump_directory_name = format!("{}/{}", self.dir, dump.directory_name);
let mut sorted_entries: Vec<_> = read_dir(dump_directory_name).unwrap()
.map(|r| r.unwrap())
.collect();
sorted_entries.sort_by_key(|dir| dir.path());
for entry in sorted_entries {
// ...
}
I think it should be ok, we can write a test function. I will just make sure that the .unwrap()
does not lead to an unexpected panic.
I think it should be ok, we can write a test function. I will just make sure that the
.unwrap()
does not lead to an unexpected panic.
Sounds good! You probably have better Rust skills for this one :)
When I create a dump from a database over the network it does not order the dump properly. Because of this I am unable to restore the dump preventing me from using Replibyte.
Idea of what the dump.sql from Replibyte looks like:
Even the insert statements themselves are not ordered properly so I cannot simply move the DDL to the top of the file.
This isn't an issue with a database that is in a local docker instance.