celo-org / optimism

Optimism is Ethereum, scaled.
https://optimism.io
MIT License
3 stars 2 forks source link

Add pre-migration command to migration script #192

Closed alecps closed 1 month ago

alecps commented 1 month ago

Addresses https://github.com/celo-org/optimism/issues/171

Tested

codecov-commenter commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 61.32%. Comparing base (e1fa92a) to head (ec9a6c3). Report is 30 commits behind head on celo7.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## celo7 #192 +/- ## ======================================= Coverage 61.32% 61.32% ======================================= Files 20 20 Lines 1753 1753 Branches 71 71 ======================================= Hits 1075 1075 Misses 646 646 Partials 32 32 ``` | [Flag](https://app.codecov.io/gh/celo-org/optimism/pull/192/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=celo-org) | Coverage Δ | | |---|---|---| | [cannon-go-tests](https://app.codecov.io/gh/celo-org/optimism/pull/192/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=celo-org) | `81.03% <ø> (ø)` | | | [chain-mon-tests](https://app.codecov.io/gh/celo-org/optimism/pull/192/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=celo-org) | `27.14% <ø> (ø)` | | | [sdk-tests](https://app.codecov.io/gh/celo-org/optimism/pull/192/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=celo-org) | `16.44% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=celo-org#carryforward-flags-in-the-pull-request-comment) to find out more.
alecps commented 1 month ago

@piersy I've added a fair amount of cleanup to this PR, including some code to log duration of different parts of the script to identify where slowdowns are.

I was able to measure the speedup of pre-running the rsync command to be ~58% according to rsync's logs. This led to a speedup of ~18s for migrating alfajores on my machine versus running a full migration where only the ancient blocks are pre-migrated.

Output after running rsync with pre-migration

sent 10717398995 bytes received 23874 bytes 220977791.11 bytes/sec total size is 16985666076 speedup is 1.58 INFO [07-24|19:24:58.243] TIMER process=copyDbExceptAncients duration=48.593819875s

Output after running rsync without pre-migration

sent 16987957677 bytes received 75244 bytes 259359281.24 bytes/sec total size is 16985666076 speedup is 1.00 INFO [07-24|23:30:17.836] TIMER process=copyDbExceptAncients duration=1m6.435774166s

The way the code is factored now, the non-ancients step isn't effected by the rsync command. So the duration for this step was ~24s for both runs. The speedup we're concerned with is in the copyDbExceptAncients step.

The overall speedup from running the entire pre-migration step was ~8m7s on alfajores data with local setup

I still need to test that op-geth and op-node can run on the migrated data after these changes

alecps commented 1 month ago

@carterqw2 I ended up removing the --update flag because after closer inspection this seems to skip any files that might have been modified more recently on the newDB. Since we don't want to preserve any modifications to the newDB that happened after the last pre-migration, I think we might be better off overwriting those files if there are any. Let me know if I'm missing something.

In practice, I didn't see any difference in this behavior when testing with / without this flag, which indicates there were no files more recently updated in the newDB in my case, which is as expected.

piersy commented 1 month ago

@piersy I've added a fair amount of cleanup to this PR, including some code to log duration of different parts of the script to identify where slowdowns are.

I was able to measure the speedup of pre-running the rsync command to be ~58% according to rsync's logs. This led to a speedup of ~18s for migrating alfajores on my machine versus running a full migration where only the ancient blocks are pre-migrated.

Output after running rsync with pre-migration

sent 10717398995 bytes received 23874 bytes 220977791.11 bytes/sec total size is 16985666076 speedup is 1.58 INFO [07-24|19:24:58.243] TIMER process=copyDbExceptAncients duration=48.593819875s

Output after running rsync without pre-migration

sent 16987957677 bytes received 75244 bytes 259359281.24 bytes/sec total size is 16985666076 speedup is 1.00 INFO [07-24|23:30:17.836] TIMER process=copyDbExceptAncients duration=1m6.435774166s

The way the code is factored now, the non-ancients step isn't effected by the rsync command. So the duration for this step was ~24s for both runs. The speedup we're concerned with is in the copyDbExceptAncients step.

The overall speedup from running the entire pre-migration step was ~8m7s on alfajores data with local setup

I still need to test that op-geth and op-node can run on the migrated data after these changes

Hey @alecps for the times listed there, for how long did you run the node between the migrations?

piersy commented 1 month ago

Looking good, a few comments:

I noticed that the command-line help seems to print some log statements. Those should probably be removed.

INFO [07-25|18:27:38.253] Beginning Cel2 Migration
NAME:
   celo-migrate pre - Perform a  pre-migration of ancient blocks and copy over all other data without transforming it. This should be run a day before the full migration command is run to minimize downtime.

USAGE:
   celo-migrate pre [command options] [arguments...]

OPTIONS:

          --old-db value
                Path to the old Celo chaindata dir, can be found at '<datadir>/celo/chaindata'

          --new-db value
                Path to write migrated Celo chaindata, note the new node implementation expects
                to find this chaindata at the following path '<datadir>/geth/chaindata

          --batch-size value                  (default: 50000)
                Batch size to use for block migration, larger batch sizes can speed up migration
                but require more memory. If increasing the batch size consider also increasing
                the memory-limit

          --buffer-size value                 (default: 0)
                Buffer size to use for ancient block migration channels. Defaults to 0. Included
                to facilitate testing for performance improvements.

          --memory-limit value                (default: 7500)
                Memory limit in MiB, should be set lower than the available amount of memory in
INFO [07-25|18:27:38.253] Finished migration successfully!
                your system to prevent out of memory errors

          --clear-non-ancients                (default: false)
                Use this to reset all data except ancients. This flag should be used if a full
                migration has already been performed on the new db.

          --measure-time                      (default: false)
                Use this to log how long each section of the script takes to run

          --help, -h                          (default: false)
                show help
(END)
alecps commented 1 month ago

@piersy I didn't run the node, I just pulled down the latest snapshot and had the old snapshot from probably ~3 weeks ago