RDS PostgreSQL dump not ordered correctly preventing restore.

davidbruce commented 2 years ago

When I create a dump from a database over the network it does not order the dump properly. Because of this I am unable to restore the dump preventing me from using Replibyte.

Idea of what the dump.sql from Replibyte looks like:

- INSERTS statements ...
- CREATE TABLE statements ...
- CREATE VIEW statements ...
- INSERT statements ...

Even the insert statements themselves are not ordered properly so I cannot simply move the DDL to the top of the file.

This isn't an issue with a database that is in a local docker instance.

evoxmusic commented 2 years ago

@davidbruce , can you show me your Replibyte config .yaml, please? Can you also confirm that you are using the latest version of Replibyte?

gugacavalieri commented 2 years ago

@davidbruce, @evoxmusic, I am having the same problem when dumping some tables that are over 100MB.

I think this could be related to the order in which the dumped chunks are being restored.

To keep parameters more flexible and as a workaround I added a PR to make it possible to set the buffer size: https://github.com/Qovery/Replibyte/pull/234.

When restoring a 160MB table with buffer_size = 100MB I got a inverted order when restoring the dump:

2.dump was restored before 1.dump and the resulting file started with a bunch of insert statement with high id numbers.

When restoring the same table using buffer_size = 200MB I only got 1.dump chunk generated and the resulting file has all the correct info in the correct order.

I still don't know 100% how this chunk order is being messed up when restoring but if you can share a little more info on that I can try to help out with a solution 😃

evoxmusic commented 2 years ago

From what you describe, I feel that the issue could come from this function.

https://github.com/Qovery/Replibyte/blob/da8cd34d2d8d1139efd4ab30f036f08e296cc24f/replibyte/src/datastore/s3.rs#L555

@gugacavalieri can you show me your replibyte config.yaml and can you also show me the content of your metadata.json at the root of your backup bucket if you are using S3? I suspect an issue there.

gugacavalieri commented 1 year ago

@evoxmusic , looks like it is only happening when I run with local_disk storage. Here is my configuration:

source:
  connection_uri: $SOURCE_DATABASE_URL
  transformers: []

  only_tables: # optional - dumps only specified tables.
    - database: staging
      table: added_stickers

datastore:
  local_disk:
    dir: ./storage

destination:
  connection_uri: $DESTINATION_DATABASE_URL

When running with buffer_size = 100 it creates the following metadata and chunks:

{"v":"0.10.0","dumps":[{"directory_name":"dump-1667763130600","size":10603029,"created_at":1667763242581,"compressed":true,"encrypted":false}]}

When restoring to local file (restore local -i mysql -v latest -o > output.sql), the first line of the file shows:

INSERT INTO `added_stickers` (`id`, ...) VALUES (511158, ...);

So ID count is at 511158. And If I scroll to line 904,501 / 2,428,792 I can see an insert with the last ID followed by the end of the dump file and then the beginning of the MySQL dump file:

INSERT INTO `added_stickers` (`id`, ...) VALUES (813513, ...);

/*!40000 ALTER TABLE `added_stickers` ENABLE KEYS */;

UNLOCK TABLES;

SET @@SESSION.SQL_LOG_BIN = @MYSQLDUMP_TEMP_LOG_BIN;

/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;

/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;

/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;

/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;

/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2022-11-06 12:32:44
-- MySQL dump 10.13  Distrib 8.0.31, for macos12.6 (arm64)
--
-- Host: xxx    Database: staging
-- ------------------------------------------------------
-- Server version   5.7.38-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;

/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

If I set buffer_size = 200 the problem does not happen. Also, I tested S3 with buffer_size = 100 and the chuncks were restored in the correct order so I guess this could be a problem with the local_disk storage.

gugacavalieri commented 1 year ago

Probably this read_dir function is returning the files in the wrong order:

https://github.com/Qovery/Replibyte/blob/da8cd34d2d8d1139efd4ab30f036f08e296cc24f/replibyte/src/datastore/local_disk.rs#L183

evoxmusic commented 1 year ago

Yes, a sort is necessary. Otherwise, it can not work. We can't rely on read_dir(..) that depends on the filesystem

gugacavalieri commented 1 year ago

Would something like this do the trick?

        let dump_directory_name = format!("{}/{}", self.dir, dump.directory_name);
        let mut sorted_entries: Vec<_> = read_dir(dump_directory_name).unwrap()
                                            .map(|r| r.unwrap())
                                            .collect();
        sorted_entries.sort_by_key(|dir| dir.path());

        for entry in sorted_entries {
          // ...
        }

evoxmusic commented 1 year ago

I think it should be ok, we can write a test function. I will just make sure that the .unwrap() does not lead to an unexpected panic.

gugacavalieri commented 1 year ago

I think it should be ok, we can write a test function. I will just make sure that the .unwrap() does not lead to an unexpected panic.

Sounds good! You probably have better Rust skills for this one :)

Qovery / Replibyte

RDS PostgreSQL dump not ordered correctly preventing restore. #208