Start9Labs / start-os

Open source Linux distro optimized for self-hosting
https://start9.com
MIT License
1.01k stars 98 forks source link

[bug]: very slow IBD #2336

Open AndySchroder opened 1 year ago

AndySchroder commented 1 year ago

Prerequisites

Server Hardware

raspi CM4, 4GB RAM, 1TB nvme SSD

StartOS Version

0.3.4.3

Client OS

Linux

Client OS Version

firefox

Browser

Firefox

Browser Version

n/a

Current Behavior

Very slow IBD. Been running for several weeks and only at block 512895.

Two issues I've noticed so far:

Other notes:

Expected Behavior

On this hardware I'd expect IBD to complete in about 5 days (tested on the same hardware with a manual install of bitcoind on ubuntu and also with Umbrel).

Steps to Reproduce

  1. Install bitcoind
  2. Wait.

Anything else?

No response

MattDHill commented 1 year ago

Hey Andy, we are very aware of this issue and have no explanation for it. Sync times for 8GB Pi and other hardware are in line with expectation. Only the 4GB Pi seems to be affected, and we have no idea why.

AndySchroder commented 1 year ago

Like I mentioned, I don't have the problem with bitcoind manually installed on ubuntu or with Umbrel on the same hardware, so it is something with your custom operating system config.

Your filesystem setup seems very complex. I suggest you start unwinding the complexity and then re-layer up until you can locate the problem.

AndySchroder commented 1 year ago

Also, my swap file is using 0 bytes. RAM usage is only about 30% and the rest of the RAM is operating as disk cache.

dr-bonez commented 1 year ago

Have you ever tried syncing on similar hardware with LUKS enabled?

dr-bonez commented 1 year ago

Also, my swap file is using 0 bytes. RAM usage is only about 30% and the rest of the RAM is operating as disk cache.

Are you using zram? Otherwise there is no swap.

AndySchroder commented 1 year ago

Also, my swap file is using 0 bytes. RAM usage is only about 30% and the rest of the RAM is operating as disk cache.

Are you using zram? Otherwise there is no swap.

My mistake, the swap file is using 0 bytes because it is turned off.

MattDHill commented 1 year ago

Per recent testing, zram/swap will not help this issue.

AndySchroder commented 1 year ago

Have you ever tried syncing on similar hardware with LUKS enabled?

No.

I never authenticate anything to boot, so you must be storing an unencrypted secret somewhere outside of LUKS, so what's the point in using LUKS?

dr-bonez commented 1 year ago

We plan to add a disk encryption feature in the future, and trivially encrypting with luks allows us to do so without rewriting the entire block device, by simply changing the key used in the header

AndySchroder commented 1 year ago

man cryptsetup (https://manpages.ubuntu.com/manpages/jammy/en/man8/cryptsetup.8.html#luks%20extension) says

   luksHeaderBackup <device> --header-backup-file <file>

          Stores a binary backup of the LUKS header and keyslot area.
          Note: Using '-' as filename writes the header backup to a file named '-'.

          WARNING: This backup file and a passphrase valid  at  the  time  of  backup  allows
          decryption  of  the  LUKS  data  area,  even if the passphrase was later changed or
          removed from the LUKS device. Also note that with a  header  backup  you  lose  the
          ability  to  securely  wipe the LUKS device by just overwriting the header and key-
          slots. You either need  to  securely  erase  all  header  backups  in  addition  or
          overwrite  the  encrypted  data area as well.  The second option is less secure, as
          some sectors can survive, e.g. due to defect management.

which leads me to believe that you need to run cryptsetup-reencrypt to be secure.

However, I understand your motivation to encrypt it now because issues like this one will pop up and it's better to roll things out gradually. If no one actually makes a header backup then it will still be secure when you finally roll out a version where the passphrase must be provided in some way on boot (it's better than nothing and simpler than using cryptsetup-reencrypt, not bot perfect).

In the long run, I don't understand how you are going to be able to do this securely on a raspi. You need to have some tamper protection of the operating system like https://puri.sm/posts/the-librem-key-makes-tamper-detection-easy/ , but I don't see how you are ever going to be able to do that on the raspi, so why try when it is going to dramatically hinder performance? You already have tons of non-free libraries that are enabled on the raspi, so it seems to be a custom build anyway. I guess without tamper evidence, you still will be protecting against a simple theft, so maybe there is some value.

AndySchroder commented 1 year ago

So, trying to find a workaround: Will changing the size of the bitcoind disk cache help at all? Did the zram/swap test not yield any help because the disk cache is storing the data encrypted in RAM? If so, thinking that if bitcond disk cache will be stored in the RAM unencrypted, maybe increasing that will help.

AndySchroder commented 1 year ago

Another option is to create another unencrypted partition and set blocksdir to use that partition for block storage only. Block data is public information, so why need to encrypt it? There would need to be some kind of tamper detection on the block data though.

AndySchroder commented 1 year ago

So are you fairly certain the issue is related to the LUKS setup then? https://github.com/Start9Labs/documentation/issues/407 was initially submitted because of this slow IBD, but from my investigations today, it seems like bitcoind is trying to make connections to IPv4 peers over IPv4 and not tor. Is this correct?

dr-bonez commented 1 year ago

StartOS automatically backs up the luks headers when making encrypted backups, in which case the header file is encrypted with the user's master password. It is also duplicated into the os config partition. Which remains local to the device, and would also be replaced if we were to update the key. You do make a good point though, that we will need to take steps to protect the original file against forensic recovery.

If the performance tradeoff is significant, we may consider removing the encryption entirely, until enabled by the user after we roll out the feature.

Simple theft is a much bigger concern for us than tampering. Tampering is a much more difficult attack to pull off, and protecting against it would have either have a trade-off against uptime due to the requirement of local attestation, or require remote attestation which is an area of active research.

AndySchroder commented 1 year ago

So, one other thing I'm thinking that could be causing problems here. If I look at the system load, it is about 6. There are 4 cores. By default, bitcoind tries to use all available cores because it assumes you don't have any other substantial loads on the system, but this LUKS seems to be fairly dominant for this processor. So, that means bitcoind is doing unnecessary parallelization which is slowing it down. Also, I think there is a time delay between when the data is decrypted by LUKS and when it goes to bitcoind. The reason I think that is because when I watch the CPU utilization it pulses the kernel threads and bitcoind and sometimes the CPU is left idle for periods even though the system load is 6 (which explains why the temperature is also lower). I want to lower the number of threads used by bitcoind, but start9 seems to keep overwriting the bitcoind.conf file, so I can't figure out how to set par=2 or par=3 to test this theory.

MattDHill commented 1 year ago

Manual changes to config will be honored in 040, but the way it works right now is the config we expose overwrites bitcoin.conf on every restart.

In the meantime, we can quickly add whatever options you want, though I'm not seeing the par option in bitcoin.conf.

dr-bonez commented 1 year ago

You actually can edit bitcoin.conf.template in the assets of bitcoind at /embassy-data/package-data/volumes/bitcoind/asstes/<version>/bitcoin.conf.template That will allow you to make changes.

AndySchroder commented 1 year ago

From bitcoind -h

  -par=<n>
       Set the number of script verification threads (-2 to 15, 0 = auto, <0 =
       leave that many cores free, default: 0)

We might want to use a par=-2 as a starting point (give 2 extra processors for LUKS and the rest of the OS) and then see if it should be -1 or -3 with more testing. I'd hope that LUKS doesn't require 3 CPU, but we would need to test it.

It's still unclear to me why the raspi4 4GB model is having problems but the 8GB raspi4 model isn't. Have you tested on a non raspi4 machine with 4GB RAM?

Also wondering, have you tried cryptsetup benchmark on the systems that work well and then compare to the raspi4 with 4GB RAM?

More related information on LUKS performance: https://www.usenix.org/sites/default/files/conference/protected-files/vault20_slides_korchagin.pdf .

Also, https://raspberrypi.stackexchange.com/questions/102064/performance-of-raspberry-pi-4-in-luks suggests that the CPU on the raspi is not supporting AES via hardware, so it needs to be done with software and that switching to another algorithm may solve the issue. However, this doesn't explain why you are getting okay performance on the raspi4 8GB model.

Also, I did some tests with and without LUKS on the raspi4 with 4GB RAM with bitcoind turned off:

write test:

sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=700M count=10 if=/dev/zero of=./testfile

yields 236 MB/s without LUKS and 72.5 MB/s with LUKS (without LUKS, writes are about 3.3x faster)

Note: while monitoring htop during the test, it appears that with LUKS it is limited by the 4 kcryptd kernel worker threads whereas without LUKS it is limited by the single dd command thread.

Doing another write test, we see that if the block size is decreased, LUKS gets even worse.

sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=70M count=100 if=/dev/zero of=./testfile

yields 238 MB/s without LUKS and 64.9 MB/s with LUKS

Now looking at reads:

sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=700M count=10 of=/dev/null if=./testfile

Yields 412 MB/s without LUKS and 43.9 MB/s with LUKS

Without LUKS, performance goes up for reads and with LUKS, performance goes down for reads. Without LUKS, reads are about 9.4x faster

Doing another read test, we see that if the block size is decreased, there isn't much of a change.

sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=70M count=100 of=/dev/null if=./testfile

Yields 414 MB/s without LUKS and 45.5 MB/s with LUKS.

AndySchroder commented 1 year ago

Okay, I did some rough testing changing the par variable and watching things in htop. It seems like the best value is -3 (1 thread).

As mentioned above, I really doubt that the raspi4 8GB does not have this same problem. Would appreciate it if you could confirm. If we are dealing with a 530GB dataset, I don't see how the extra 4GB is going to help that much.

I suggest you move the block storage outside of LUKS or find a different algorithm for LUKS that is better compatible with the raspi4 CPU.

dr-bonez commented 1 year ago

@AndySchroder can you paste the output of cryptsetup benchmark from your system?

AndySchroder commented 1 year ago

That command does not seem to work on start9, but I've run it on ubuntu on the same hardware. Here is the result:

$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       403919 iterations per second for 256-bit key
PBKDF2-sha256     638596 iterations per second for 256-bit key
PBKDF2-sha512     506069 iterations per second for 256-bit key
PBKDF2-ripemd160  332670 iterations per second for 256-bit key
PBKDF2-whirlpool  123419 iterations per second for 256-bit key
argon2i       4 iterations, 344794 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 339659 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        21.9 MiB/s        76.5 MiB/s
    serpent-cbc        128b        34.6 MiB/s        36.0 MiB/s
    twofish-cbc        128b        55.8 MiB/s        57.5 MiB/s
        aes-cbc        256b        17.3 MiB/s        58.0 MiB/s
    serpent-cbc        256b        35.5 MiB/s        36.0 MiB/s
    twofish-cbc        256b        56.9 MiB/s        57.2 MiB/s
        aes-xts        256b        83.2 MiB/s        74.3 MiB/s
    serpent-xts        256b        35.9 MiB/s        37.0 MiB/s
    twofish-xts        256b        59.2 MiB/s        59.8 MiB/s
        aes-xts        512b        65.0 MiB/s        56.9 MiB/s
    serpent-xts        512b        37.1 MiB/s        37.0 MiB/s
    twofish-xts        512b        61.3 MiB/s        60.2 MiB/s

Seems like these results are on the same order of magnitude as my actual I/O tests. I guess because the nvme disk is fast enough, it's not a bottleneck.

What do you get on the raspi4 with 8GB RAM?

AndySchroder commented 1 year ago

Also, monitoring bitcoind on ubuntu without LUKS with htop, it was doing about 90MB/s to 110MB/s data transfer with all the cores dedicated to bitcoind instead of 3 of the cores dedicated to LUKS (which is what start9 does).

Said another way, bitcoind seems to limit the transfer without LUKS and with LUKS, LUKS seems to limit the transfer rate and not bitcoind.

AndySchroder commented 1 year ago

Comparing to an old Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz


$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       705636 iterations per second for 256-bit key
PBKDF2-sha256     910222 iterations per second for 256-bit key
PBKDF2-sha512     695342 iterations per second for 256-bit key
PBKDF2-ripemd160  456696 iterations per second for 256-bit key
PBKDF2-whirlpool  316981 iterations per second for 256-bit key
argon2i       4 iterations, 336139 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 340218 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       124.3 MiB/s       146.5 MiB/s
    serpent-cbc        128b        50.0 MiB/s       190.9 MiB/s
    twofish-cbc        128b       123.7 MiB/s       157.4 MiB/s
        aes-cbc        256b        99.1 MiB/s       114.5 MiB/s
    serpent-cbc        256b        50.0 MiB/s       192.2 MiB/s
    twofish-cbc        256b       123.7 MiB/s       157.3 MiB/s
        aes-xts        256b       147.1 MiB/s       148.0 MiB/s
    serpent-xts        256b       180.1 MiB/s       183.8 MiB/s
    twofish-xts        256b       153.6 MiB/s       155.7 MiB/s
        aes-xts        512b       114.0 MiB/s       114.5 MiB/s
    serpent-xts        512b       180.2 MiB/s       183.5 MiB/s
    twofish-xts        512b       153.8 MiB/s       156.0 MiB/s

the modern raspi4 seems pretty slow.

k0gen commented 1 year ago

That command does not seem to work on start9, but I've run it on ubuntu on the same hardware. Here is the result:

Make sure you use sudo for that, so:

sudo cryptsetup benchmark

What do you get on the raspi4 with 8GB RAM?

Here is my RPi4 8GB (zram enabled + custom vm settings):

sudo cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       139290 iterations per second for 256-bit key
PBKDF2-sha256     228348 iterations per second for 256-bit key
PBKDF2-sha512     186181 iterations per second for 256-bit key
PBKDF2-ripemd160  119156 iterations per second for 256-bit key
PBKDF2-whirlpool   47627 iterations per second for 256-bit key
argon2i       4 iterations, 102904 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 116417 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        24.4 MiB/s        27.3 MiB/s
    serpent-cbc        128b               N/A               N/A
    twofish-cbc        128b        21.2 MiB/s        21.6 MiB/s
        aes-cbc        256b        19.9 MiB/s        21.6 MiB/s
    serpent-cbc        256b               N/A               N/A
    twofish-cbc        256b        22.4 MiB/s        23.0 MiB/s
        aes-xts        256b        32.7 MiB/s        32.7 MiB/s
    serpent-xts        256b               N/A               N/A
    twofish-xts        256b        24.2 MiB/s        24.3 MiB/s
        aes-xts        512b        26.1 MiB/s        26.5 MiB/s
    serpent-xts        512b               N/A               N/A
    twofish-xts        512b        24.4 MiB/s        22.7 MiB/s

Here is my M1 qemu devBOX (8GB with 6cores) for comparison:

sudo cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1716163 iterations per second for 256-bit key
PBKDF2-sha256    3615779 iterations per second for 256-bit key
PBKDF2-sha512    1846084 iterations per second for 256-bit key
PBKDF2-ripemd160  723155 iterations per second for 256-bit key
PBKDF2-whirlpool  468114 iterations per second for 256-bit key
argon2i       4 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       730.4 MiB/s      3755.5 MiB/s
    serpent-cbc        128b        56.1 MiB/s        60.8 MiB/s
    twofish-cbc        128b       101.1 MiB/s       107.2 MiB/s
        aes-cbc        256b       587.6 MiB/s      2289.6 MiB/s
    serpent-cbc        256b        41.0 MiB/s        60.2 MiB/s
    twofish-cbc        256b       104.1 MiB/s       134.4 MiB/s
        aes-xts        256b      2019.4 MiB/s      2196.2 MiB/s
    serpent-xts        256b        60.2 MiB/s        62.1 MiB/s
    twofish-xts        256b       133.7 MiB/s       135.1 MiB/s
        aes-xts        512b      2171.2 MiB/s      1957.6 MiB/s
    serpent-xts        512b        60.4 MiB/s        63.2 MiB/s
    twofish-xts        512b       133.0 MiB/s       129.4 MiB/s
dr-bonez commented 1 year ago

You seem to be getting significantly better results for aes-xts on ubuntu. This is the same hardware as your RPi4 8GB, yes? What kernel is it running? What board revision is it? (cat /proc/cpuinfo)

AndySchroder commented 1 year ago

I don't have a raspi4 with 8GB RAM, I only have a raspi4 with 4GB ram.

Taking look at https://raspberrypi.stackexchange.com/questions/102064/performance-of-raspberry-pi-4-in-luks referenced above, it leads you to https://rr-developer.github.io/LUKS-on-Raspberry-Pi/ . I'll now repeat my above tests with that considred.

See also:

Ubuntu raspi4 4GB RAM

$ cat /etc/issue
Ubuntu 20.04.1 LTS \n \l

$ 
$ uname -a
Linux xxxxxxxxxxxxxxxxxxx 5.4.0-1045-raspi #49-Ubuntu SMP PREEMPT Wed Sep 29 17:49:16 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
$ 
$ cat /proc/cpuinfo
processor   : 0
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor   : 1
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor   : 2
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor   : 3
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

Hardware    : BCM2835
Revision    : c03140
Serial      : 10000000c2b9c5a8
Model       : Raspberry Pi ? Rev 1.0
$ 
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       405168 iterations per second for 256-bit key
PBKDF2-sha256     649675 iterations per second for 256-bit key
PBKDF2-sha512     515017 iterations per second for 256-bit key
PBKDF2-ripemd160  335651 iterations per second for 256-bit key
PBKDF2-whirlpool  107612 iterations per second for 256-bit key
argon2i       4 iterations, 374573 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 376944 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        23.8 MiB/s        77.0 MiB/s
    serpent-cbc        128b        35.6 MiB/s        36.2 MiB/s
    twofish-cbc        128b        57.6 MiB/s        57.8 MiB/s
        aes-cbc        256b        17.3 MiB/s        58.4 MiB/s
    serpent-cbc        256b        35.7 MiB/s        36.2 MiB/s
    twofish-cbc        256b        57.6 MiB/s        57.3 MiB/s
        aes-xts        256b        84.8 MiB/s        74.8 MiB/s
    serpent-xts        256b        37.4 MiB/s        37.4 MiB/s
    twofish-xts        256b        62.0 MiB/s        60.7 MiB/s
        aes-xts        512b        65.2 MiB/s        57.2 MiB/s
    serpent-xts        512b        37.4 MiB/s        37.4 MiB/s
    twofish-xts        512b        62.0 MiB/s        60.8 MiB/s
$ 
$ cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b       143.8 MiB/s       144.5 MiB/s
$ 
$ for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha12,aes-adiantum        256b       171.6 MiB/s       172.5 MiB/s
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b       141.8 MiB/s       142.2 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-xts        256b        84.9 MiB/s        75.0 MiB/s
$ 

Start9 raspi4 4GB RAM

root@kinky-honor:/home/start9# cat /etc/issue
Debian GNU/Linux 11 \n \l

root@kinky-honor:/home/start9# 
root@kinky-honor:/home/start9# uname -a
Linux kinky-honor 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
root@kinky-honor:/home/start9# 
root@kinky-honor:/home/start9# cat /proc/cpuinfo
processor   : 0
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor   : 1
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor   : 2
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

processor   : 3
BogoMIPS    : 108.00
Features    : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd08
CPU revision    : 3

Hardware    : BCM2835
Revision    : c03140
Serial      : 10000000782e0c04
Model       : Raspberry Pi Compute Module 4 Rev 1.0
root@kinky-honor:/home/start9# 
root@kinky-honor:/home/start9# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       143404 iterations per second for 256-bit key
PBKDF2-sha256     234057 iterations per second for 256-bit key
PBKDF2-sha512     189137 iterations per second for 256-bit key
PBKDF2-ripemd160  121362 iterations per second for 256-bit key
PBKDF2-whirlpool   38916 iterations per second for 256-bit key
argon2i       4 iterations, 146848 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 190143 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        33.9 MiB/s        35.9 MiB/s
    serpent-cbc        128b               N/A               N/A
    twofish-cbc        128b        25.4 MiB/s        25.5 MiB/s
        aes-cbc        256b        26.8 MiB/s        27.9 MiB/s
    serpent-cbc        256b               N/A               N/A
    twofish-cbc        256b        25.1 MiB/s        23.0 MiB/s
        aes-xts        256b        34.4 MiB/s        35.2 MiB/s
    serpent-xts        256b               N/A               N/A
    twofish-xts        256b        26.3 MiB/s        26.1 MiB/s
        aes-xts        512b        28.2 MiB/s        28.9 MiB/s
    serpent-xts        512b               N/A               N/A
    twofish-xts        512b        26.4 MiB/s        26.0 MiB/s
root@kinky-honor:/home/start9# 
root@kinky-honor:/home/start9# cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b        57.1 MiB/s        58.0 MiB/s
root@kinky-honor:/home/start9# 
root@kinky-honor:/home/start9# for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha12,aes-adiantum        256b        67.3 MiB/s        67.7 MiB/s
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b        56.5 MiB/s        57.6 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-xts        256b        36.2 MiB/s        37.2 MiB/s
root@kinky-honor:/home/start9# 

Ubuntu with old Intel CPU

$ cat /etc/issue
Ubuntu 20.04.6 LTS \n \l

$ 
$ uname -a
Linux xxxxxxxxxxxxxxxxxxx 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ 
$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Core(TM)2 Duo CPU     P8800  @ 2.66GHz
stepping    : 10
microcode   : 0xa0b
cpu MHz     : 2412.673
cache size  : 3072 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti tpr_shadow vnmi flexpriority dtherm
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips    : 5309.25
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Core(TM)2 Duo CPU     P8800  @ 2.66GHz
stepping    : 10
microcode   : 0xa0b
cpu MHz     : 2469.047
cache size  : 3072 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti tpr_shadow vnmi flexpriority dtherm
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips    : 5309.25
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

$ 
$ 
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       712347 iterations per second for 256-bit key
PBKDF2-sha256     913393 iterations per second for 256-bit key
PBKDF2-sha512     696265 iterations per second for 256-bit key
PBKDF2-ripemd160  459901 iterations per second for 256-bit key
PBKDF2-whirlpool  317365 iterations per second for 256-bit key
argon2i       4 iterations, 332038 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 341568 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       123.2 MiB/s       145.4 MiB/s
    serpent-cbc        128b        49.6 MiB/s       191.3 MiB/s
    twofish-cbc        128b       123.1 MiB/s       156.3 MiB/s
        aes-cbc        256b        99.6 MiB/s       114.5 MiB/s
    serpent-cbc        256b        50.0 MiB/s       192.2 MiB/s
    twofish-cbc        256b       123.0 MiB/s       155.8 MiB/s
        aes-xts        256b       145.9 MiB/s       147.3 MiB/s
    serpent-xts        256b       178.2 MiB/s       184.9 MiB/s
    twofish-xts        256b       153.6 MiB/s       154.9 MiB/s
        aes-xts        512b       113.6 MiB/s       110.8 MiB/s
    serpent-xts        512b       176.0 MiB/s       186.2 MiB/s
    twofish-xts        512b       153.7 MiB/s       156.0 MiB/s
$ 
$ cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b       315.4 MiB/s       345.1 MiB/s
$ 
$ for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha12,aes-adiantum        256b       427.2 MiB/s       429.2 MiB/s
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b       373.2 MiB/s       371.8 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-xts        256b       146.4 MiB/s       147.7 MiB/s
$ 

Ubuntu with modern Intel CPU

$ 
$ cat /etc/issue
Ubuntu 22.04.2 LTS \n \l

$ 
$ 
$ uname -a
Linux xxxxxxxxxxxxxxxxxxx 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ 
$ 
$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 166
model name  : Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
stepping    : 0
microcode   : 0xf4
cpu MHz     : 500.044
cache size  : 12288 KB
physical id : 0
siblings    : 12
core id     : 0
cpu cores   : 6
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit mmio_stale_data retbleed eibrs_pbrsb
bogomips    : 3199.92
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

~~~~~~ truncated the remaining 11 processors ~~~~~~~~

$ 
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1444319 iterations per second for 256-bit key
PBKDF2-sha256    1985939 iterations per second for 256-bit key
PBKDF2-sha512    1438375 iterations per second for 256-bit key
PBKDF2-ripemd160  834853 iterations per second for 256-bit key
PBKDF2-whirlpool  655360 iterations per second for 256-bit key
argon2i       6 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      6 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b      1194.2 MiB/s      3584.3 MiB/s
    serpent-cbc        128b       102.5 MiB/s       782.0 MiB/s
    twofish-cbc        128b       233.2 MiB/s       417.5 MiB/s
        aes-cbc        256b       909.6 MiB/s      2885.8 MiB/s
    serpent-cbc        256b       101.9 MiB/s       787.1 MiB/s
    twofish-cbc        256b       232.4 MiB/s       417.9 MiB/s
        aes-xts        256b      3567.4 MiB/s      3576.0 MiB/s
    serpent-xts        256b       672.8 MiB/s       678.0 MiB/s
    twofish-xts        256b       390.6 MiB/s       389.0 MiB/s
        aes-xts        512b      2885.3 MiB/s      2890.4 MiB/s
    serpent-xts        512b       675.2 MiB/s       680.0 MiB/s
    twofish-xts        512b       387.1 MiB/s       386.7 MiB/s
$ 
$ 
$ cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b      1522.8 MiB/s      1542.1 MiB/s
$ 
$ 
$ for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha12,aes-adiantum        256b      1822.4 MiB/s      1883.8 MiB/s
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b      1575.4 MiB/s      1574.7 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-xts        256b      3516.2 MiB/s      3511.7 MiB/s
$ 

It seems like start9 is performing much worse than ubuntu on the same raspi4 hardware and xchacha12,aes-adiantum-plain64 seems to be the fastest algorithm I've found for the raspi4 hardware (but I have no idea of the security tradeoffs for it).

AndySchroder commented 1 year ago

Okay, so I am approaching 3 weeks now and I'm only at block 568,790 .

Wondering, can I run cryptsetup-reencrypt --decrypt and decrypt it for now? Or, can I run cryptsetup-reencrypt -c xchacha12,aes-adiantum (that looked like the fastest for the raspi based on my tests shown above)? It's possible it might make sense for me to just completely erase the disk and do that after first boot. However, I'm not sure of the exact way to execute the command because your operating system has such a special setup.

Also, I'd consider removing the raspi download links for any version that has this slow encryption enabled because it's just not usable.

k0gen commented 1 year ago

@AndySchroder because I don't have access to 4GB RPi4 please try tweaking your memory management subsystem a bit and see if it will improve the IBD.

Let it sit for some time and observe if it made any difference.

MattDHill commented 1 year ago

We are pretty confident this is a result of disk caching. We are preparing to release a version of StartOS that disabled LUKS for Raspberry Pi. Hopefully that will take care of it.

AndySchroder commented 1 year ago

Okay, I got busy and also had the system powered down for a while. I've just made the changes you requested. Will these changes persist if I reboot? Also, I'm a bit confused how this is going to help because I thought zram used more CPU to compress the data in RAM?

We are at block 588,913 .

Will report back in a week on the block height.

k0gen commented 1 year ago

Okay, I got busy and also had the system powered down for a while. I've just made the changes you requested. Will these changes persist if I reboot? Also, I'm a bit confused how this is going to help because I thought zram used more CPU to compress the data in RAM?

By increasing vm.vfs_cache_pressure and vm.swappiness, the kernel is more aggressive in freeing cache memory and using swap space, respectively, aiding performance during blockchain synchronization on low-memory systems like Raspberry Pi 4 with 4GB RAM.

Lowering vm.dirty_background_ratio and vm.dirty_ratio allows more prompt writing of dirty pages to disk, preventing excessive memory usage during synchronization and ensuring smoother performance. Additionally, enabling Zram increases virtual memory, further improving resource efficiency.

Those changes will revert back to system defaults if you restart.

AndySchroder commented 1 year ago

Okay, it has been over a week and I am now at block 634,370. I don't think your suggestion helped.

Have you looked at my test results for cryptsetup benchmark? It seems as though much could be learned from that.

In the mean time, have you released the version for the raspi without LUKS?

MattDHill commented 1 year ago

We have tested without LUKS and unfortunately saw no improvement in performance. You can test this yourself on 0.3.4.4 by setting disable-encryption: true in /etc/embassy/config.yaml before fresh setup. Make sure not to reboot after setting that option before you set up because that file is in the overlay and changes will not persist across reboots.

MattDHill commented 1 year ago

FYI startd needs to be restarted after the config file is updated in order for the config changes to be picked up

remcoros commented 1 year ago

You might want to check the clock speed and governor used.

I followed the DIY guide, installed start9 on a RPI-4, and noticed that it was always running on 600 mhz. The governor of the CPUs were all set to 'powersafe'.

on RPI-OS, this is changed on boot by the "raspi-config" systemd script to 'ondemand'.

This is missing from start9, so it was stuck on 600mhz, even under load.

I fixed it by going into the chroot environment, installed raspi-config trough apt, and exit/reboot.