Potential for "instant sync"

shawaj commented 3 years ago

@ryanteck @vpetersson I am pretty sure we can improve our syncing to match syncrobits "instant sync" setup...

All we need to do is stand up a miner in the cloud (which we will probably do anyway for validators in the future?)

Then, we just need to take a snapshot:

miner snapshot take /path/to/snapshot

Then save it somewhere where our miners can pull it down and verify it. Perhaps signed by a private key that we control?

Then on the miners we just need to:

miner snapshot load /path/to/snapshot

My thought would be to do a snapshot every 100 or so blocks.

Like syncrobit we can then have it so that if a hotspot falls behind more than 250 blocks it automatically pulls down a new snapshot from our servers.

What do you think?

Ref doc... https://docs.helium.com/use-the-network/build-a-packet-forwarder#snapshots

ryanteck commented 3 years ago

I have it so it updates every 6 hours currently. I didn't think too much more often was wise as increases risk of issues.

I can increase it if you want. Likely every hour would work.

On Sun, 30 May 2021, 20:44 Aaron Shaw, @.***> wrote:

@ryanteck https://github.com/ryanteck @vpetersson https://github.com/vpetersson I am pretty sure we can improve our syncing to match syncrobits "instant sync" setup...

All we need to do is stand up a miner in the cloud (which we will probably do anyway for validators in the future?)

Then, we just need to take a snapshot:

miner snapshot take /path/to/snapshot

Then save it somewhere where our miners can pull it down and verify it. Perhaps signed by a private key that we control?

Then on the miners we just need to:

miner snapshot load /path/to/snapshot

My thought would be to do a snapshot every 100 or so blocks.

Like syncrobit we can then have it so that if a hotspot falls behind more than 250 blocks it automatically pulls down a new snapshot from our servers.

What do you think?

https://docs.helium.com/use-the-network/build-a-packet-forwarder#snapshots

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NebraLtd/snapshot-bumper/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAINW2IIBLE2UBQE2U66NQTTQKIQPANCNFSM45ZVLA2Q .

shawaj commented 3 years ago

Why would more often increase risk of issues?

But is this pulling down a snapshot? I thought this was just updating the blessed block?

Block time is around 60 secs so I guess we could do like every 2 hours or something? Which is about 120 blocks.

ryanteck commented 3 years ago

Just in case it breaks. I don't like keep changing stuff in production which could cause it all to screw up.

While block time is 60 seconds a snapshot is done from a blessed block which aren't anywhere near as frequent.

The last blessed block according to the bumper system was at 12PM today, ~ 537 left to sync.

I currently have it on a cron of every 12 hours, but I can change that maybe to every 4 hours which should be more than frequent enough.

shawaj commented 3 years ago

That's not what I was suggesting though, I'm suggesting to create our own snapshots from a fully synced cloud miner, not from a blessed block (that's every 720 blocks IIRC).

Then we can sync instantly.

Re the breaking stuff - We should build in testing that can check for issues so it's not a question of "just in case" but we have a high level of confidence it is all good. We can also test changes on testnet

vpetersson commented 3 years ago

@shawaj I don't have enough domain expertise at this point in time to comment on the viability of this. That said, wouldn't underestimate the complexity of this. Yes, it's easy in theory, but would require a fair bit of considerations in the logic.

If I understand the objective for this, it is largely to make the initial sync fast, right? If so, that would simplify things as that will not affect existing nodes. In that case, we could just not bake in any blockchain at all in the actual Balena image (not sure if we are at the moment), and then just do hourly snapshots and pull them down as part of the activation flow.

If we do it this way, it would carry a lot less risk of breaking devices in the wild.

ryanteck commented 3 years ago

I guess then if it's a case of creating new snapshots it would be similar to what we do now, but instead of just downloading a new configuration file on boot it would download a new snapshot.

That would cover it for when they first turn on so covers initial syncing and if they've been powered off for a while would catch up.

At the same time all devices would pull the snapshot whenever we push a new GA as well.

shawaj commented 3 years ago

Yes that sounds like a great idea to me.

For reference this is how Syncro do it... Screenshot_20210529-174337_Discord

shawaj commented 3 years ago

@vpetersson now I think about it, didn't you set up a miner on GCP already?

vpetersson commented 3 years ago

We did set up one some time ago as an experiment but haven't really checked in on it in a while.

shawaj commented 3 years ago

Ok I spoke to George from Syncrobit and he has actually made the instant sync thing publicly available.

So it uses https://msync.syncrob.it (which is a cloud miner essentially) and generates a snapshot on the fly as well as the following output:

{"fileUri":"https:\/\/msync.eu.syncrob.it\/snapshots\/snapshot_2021_06_02_12.bin","checkSum":{"md5":"4d28cc579d1465a3b44c1a9f61f0a8dc","sha1":"b0277240b07ae2f39bf6f3dad13284a74625f9a2","sha256":"1ca4858f875c0cc5b9a40c844d5e92d72921f0f4276c47b55c0143d4360c0162"},"minerRelease":"miner-amd64_2021.05.29.1_GA","blockHeight":"868512","timestamp":"2021-06-02T16:35:40Z","expires":"2021-06-02T17:35:40Z"}

The snapshot file is around 80MB https://drive.google.com/file/d/1UZK-mc1ZG2zOpcqofNf4_NrteLp4x6B3/view?usp=sharing

This uses the following script from here:

#!/bin/bash

#Install dependecies
echo "Installing Dependecies..."
sudo apt install curl wget jq -y
echo " "
echo " "

#Make Sync Request...
echo "Grabbing data..."
data=$(curl -s https://msync.syncrob.it/)
miner_version=$(jq -r '.minerRelease' <<< $data)
block_height=$(jq -r '.blockHeight' <<< $data)

echo " "
echo " "
echo "Miner Release: $miner_version"
echo "Snapshot Block Height: $block_height"
echo " "
echo " "

#Download file
echo "Downloading File..."
file=$(jq -r '.fileUri' <<< $data)
wget -O snapshot.bin ${file} -q --show-progress

#Verify downloaded file CheckSum
echo "Verifying downloaded file integrity..."
check_sum=$(jq -r '.checkSum.md5' <<< $data)
d_check_sum=$(md5sum snapshot.bin | awk '{print $1}')

if [ "$d_check_sum" = "$check_sum" ];
then
    echo "Checksum PASS."
else
    echo "Checksum FAIL."
    exit  
fi

echo " "
echo " "

#Restore SnapShot
echo "Copying & Restoring snapshot..."
cp snapshot.bin /home/pi/miner_data/snapshot.bin
docker exec miner miner snapshot load /var/data/snapshot.bin

echo " "
echo " "

echo "All Done!"

ryanteck commented 3 years ago

Honestly I think for the hassle this will cause it isn't worth it.

Every time we push out an update to hotspots that would then mean hundereds of gigs of data would need to be served. (Approximately 100GB per 1280 hotspots).

Some rough napkin maths and this would roughly equate to:

1,280 Units = 100GB
12,800 = 1000GB (1TB)
64,000 = 5000GB (5TB)
128,000 = 10000GB (10TB)

Per update assuming all are online at the time.

I count approximately 12 updates to the Helium Miner in the last month were pushed. If we said the figure of 64k hotspots online that's then 60TB per month.

With some rough pricing from GCP that'd cost us thousands per month extra (Approx $4k) to offer this service if I can work out GCPs pricing right.

All in all to maybe cut down a few hours at most?

shawaj commented 3 years ago

we can probably just do it periodically and serve it through a CDN

ryanteck commented 3 years ago

So still want this adding despite not adding really much value at all?

How high is this on the priority list? Low down I presume?

On Thu, 3 Jun 2021, 01:47 Aaron Shaw, @.***> wrote:

we can probably just do it periodically and serve it through a CDN

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NebraLtd/snapshot-bumper/issues/2#issuecomment-853475861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAINW2NXO7ECIYSUB7HW6KDTQ3GKVANCNFSM45ZVLA2Q .

shawaj commented 3 years ago

i think it adds quite a lot of value - specifically for those people who are having issues connecting through S3 and to peers to get the snapshot. It will also significantly reduce our customer support "wen sync" queries which are only going to increase as more are shipping...

Also - with the snapshot - had another idea. Since the snapshot files are under 100MB we can host them on GitHub and just update them automatically every 2 hours or whatever.

The calcs you did of serving a new snapshot every time we do an update - this also isn't really required. Because most of the units will not be 250+ blocks behind after an update. The only times they would download the snapshot is when they fall behind more than 250 blocks. So this would basically be if they are new units, or they had been offline for over 4 hours.

This also doesn't need to be on GCP - we can use a server with Mythic or whatever for this. Or as I said - using GitHub

NebraLtd / snapshot-bumper

Potential for "instant sync" #2