Open AndySchroder opened 1 year ago
Hey Andy, we are very aware of this issue and have no explanation for it. Sync times for 8GB Pi and other hardware are in line with expectation. Only the 4GB Pi seems to be affected, and we have no idea why.
Like I mentioned, I don't have the problem with bitcoind manually installed on ubuntu or with Umbrel on the same hardware, so it is something with your custom operating system config.
Your filesystem setup seems very complex. I suggest you start unwinding the complexity and then re-layer up until you can locate the problem.
Also, my swap file is using 0 bytes. RAM usage is only about 30% and the rest of the RAM is operating as disk cache.
Have you ever tried syncing on similar hardware with LUKS enabled?
Also, my swap file is using 0 bytes. RAM usage is only about 30% and the rest of the RAM is operating as disk cache.
Are you using zram? Otherwise there is no swap.
Also, my swap file is using 0 bytes. RAM usage is only about 30% and the rest of the RAM is operating as disk cache.
Are you using zram? Otherwise there is no swap.
My mistake, the swap file is using 0 bytes because it is turned off.
Per recent testing, zram/swap will not help this issue.
Have you ever tried syncing on similar hardware with LUKS enabled?
No.
I never authenticate anything to boot, so you must be storing an unencrypted secret somewhere outside of LUKS, so what's the point in using LUKS?
We plan to add a disk encryption feature in the future, and trivially encrypting with luks allows us to do so without rewriting the entire block device, by simply changing the key used in the header
man cryptsetup
(https://manpages.ubuntu.com/manpages/jammy/en/man8/cryptsetup.8.html#luks%20extension) says
luksHeaderBackup <device> --header-backup-file <file> Stores a binary backup of the LUKS header and keyslot area. Note: Using '-' as filename writes the header backup to a file named '-'. WARNING: This backup file and a passphrase valid at the time of backup allows decryption of the LUKS data area, even if the passphrase was later changed or removed from the LUKS device. Also note that with a header backup you lose the ability to securely wipe the LUKS device by just overwriting the header and key- slots. You either need to securely erase all header backups in addition or overwrite the encrypted data area as well. The second option is less secure, as some sectors can survive, e.g. due to defect management.
which leads me to believe that you need to run cryptsetup-reencrypt to be secure.
However, I understand your motivation to encrypt it now because issues like this one will pop up and it's better to roll things out gradually. If no one actually makes a header backup then it will still be secure when you finally roll out a version where the passphrase must be provided in some way on boot (it's better than nothing and simpler than using cryptsetup-reencrypt
, not bot perfect).
In the long run, I don't understand how you are going to be able to do this securely on a raspi. You need to have some tamper protection of the operating system like https://puri.sm/posts/the-librem-key-makes-tamper-detection-easy/ , but I don't see how you are ever going to be able to do that on the raspi, so why try when it is going to dramatically hinder performance? You already have tons of non-free libraries that are enabled on the raspi, so it seems to be a custom build anyway. I guess without tamper evidence, you still will be protecting against a simple theft, so maybe there is some value.
So, trying to find a workaround: Will changing the size of the bitcoind disk cache help at all? Did the zram/swap test not yield any help because the disk cache is storing the data encrypted in RAM? If so, thinking that if bitcond disk cache will be stored in the RAM unencrypted, maybe increasing that will help.
Another option is to create another unencrypted partition and set blocksdir
to use that partition for block storage only. Block data is public information, so why need to encrypt it? There would need to be some kind of tamper detection on the block data though.
So are you fairly certain the issue is related to the LUKS setup then? https://github.com/Start9Labs/documentation/issues/407 was initially submitted because of this slow IBD, but from my investigations today, it seems like bitcoind is trying to make connections to IPv4 peers over IPv4 and not tor. Is this correct?
StartOS automatically backs up the luks headers when making encrypted backups, in which case the header file is encrypted with the user's master password. It is also duplicated into the os config partition. Which remains local to the device, and would also be replaced if we were to update the key. You do make a good point though, that we will need to take steps to protect the original file against forensic recovery.
If the performance tradeoff is significant, we may consider removing the encryption entirely, until enabled by the user after we roll out the feature.
Simple theft is a much bigger concern for us than tampering. Tampering is a much more difficult attack to pull off, and protecting against it would have either have a trade-off against uptime due to the requirement of local attestation, or require remote attestation which is an area of active research.
So, one other thing I'm thinking that could be causing problems here. If I look at the system load, it is about 6. There are 4 cores. By default, bitcoind tries to use all available cores because it assumes you don't have any other substantial loads on the system, but this LUKS seems to be fairly dominant for this processor. So, that means bitcoind is doing unnecessary parallelization which is slowing it down. Also, I think there is a time delay between when the data is decrypted by LUKS and when it goes to bitcoind. The reason I think that is because when I watch the CPU utilization it pulses the kernel threads and bitcoind and sometimes the CPU is left idle for periods even though the system load is 6 (which explains why the temperature is also lower). I want to lower the number of threads used by bitcoind, but start9 seems to keep overwriting the bitcoind.conf
file, so I can't figure out how to set par=2
or par=3
to test this theory.
Manual changes to config will be honored in 040, but the way it works right now is the config we expose overwrites bitcoin.conf on every restart.
In the meantime, we can quickly add whatever options you want, though I'm not seeing the par
option in bitcoin.conf.
You actually can edit bitcoin.conf.template in the assets of bitcoind at /embassy-data/package-data/volumes/bitcoind/asstes/<version>/bitcoin.conf.template
That will allow you to make changes.
From bitcoind -h
-par=<n>
Set the number of script verification threads (-2 to 15, 0 = auto, <0 =
leave that many cores free, default: 0)
We might want to use a par=-2 as a starting point (give 2 extra processors for LUKS and the rest of the OS) and then see if it should be -1 or -3 with more testing. I'd hope that LUKS doesn't require 3 CPU, but we would need to test it.
It's still unclear to me why the raspi4 4GB model is having problems but the 8GB raspi4 model isn't. Have you tested on a non raspi4 machine with 4GB RAM?
Also wondering, have you tried cryptsetup benchmark
on the systems that work well and then compare to the raspi4 with 4GB RAM?
More related information on LUKS performance: https://www.usenix.org/sites/default/files/conference/protected-files/vault20_slides_korchagin.pdf .
Also, https://raspberrypi.stackexchange.com/questions/102064/performance-of-raspberry-pi-4-in-luks suggests that the CPU on the raspi is not supporting AES via hardware, so it needs to be done with software and that switching to another algorithm may solve the issue. However, this doesn't explain why you are getting okay performance on the raspi4 8GB model.
write test:
sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=700M count=10 if=/dev/zero of=./testfile
yields 236 MB/s without LUKS and 72.5 MB/s with LUKS (without LUKS, writes are about 3.3x faster)
Note: while monitoring htop during the test, it appears that with LUKS it is limited by the 4 kcryptd kernel worker threads whereas without LUKS it is limited by the single dd command thread.
Doing another write test, we see that if the block size is decreased, LUKS gets even worse.
sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=70M count=100 if=/dev/zero of=./testfile
yields 238 MB/s without LUKS and 64.9 MB/s with LUKS
Now looking at reads:
sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=700M count=10 of=/dev/null if=./testfile
Yields 412 MB/s without LUKS and 43.9 MB/s with LUKS
Without LUKS, performance goes up for reads and with LUKS, performance goes down for reads. Without LUKS, reads are about 9.4x faster
Doing another read test, we see that if the block size is decreased, there isn't much of a change.
sync
echo 3 > /proc/sys/vm/drop_caches
dd bs=70M count=100 of=/dev/null if=./testfile
Yields 414 MB/s without LUKS and 45.5 MB/s with LUKS.
Okay, I did some rough testing changing the par
variable and watching things in htop. It seems like the best value is -3
(1 thread).
As mentioned above, I really doubt that the raspi4 8GB does not have this same problem. Would appreciate it if you could confirm. If we are dealing with a 530GB dataset, I don't see how the extra 4GB is going to help that much.
I suggest you move the block storage outside of LUKS or find a different algorithm for LUKS that is better compatible with the raspi4 CPU.
@AndySchroder can you paste the output of cryptsetup benchmark
from your system?
That command does not seem to work on start9, but I've run it on ubuntu on the same hardware. Here is the result:
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 403919 iterations per second for 256-bit key
PBKDF2-sha256 638596 iterations per second for 256-bit key
PBKDF2-sha512 506069 iterations per second for 256-bit key
PBKDF2-ripemd160 332670 iterations per second for 256-bit key
PBKDF2-whirlpool 123419 iterations per second for 256-bit key
argon2i 4 iterations, 344794 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 339659 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 21.9 MiB/s 76.5 MiB/s
serpent-cbc 128b 34.6 MiB/s 36.0 MiB/s
twofish-cbc 128b 55.8 MiB/s 57.5 MiB/s
aes-cbc 256b 17.3 MiB/s 58.0 MiB/s
serpent-cbc 256b 35.5 MiB/s 36.0 MiB/s
twofish-cbc 256b 56.9 MiB/s 57.2 MiB/s
aes-xts 256b 83.2 MiB/s 74.3 MiB/s
serpent-xts 256b 35.9 MiB/s 37.0 MiB/s
twofish-xts 256b 59.2 MiB/s 59.8 MiB/s
aes-xts 512b 65.0 MiB/s 56.9 MiB/s
serpent-xts 512b 37.1 MiB/s 37.0 MiB/s
twofish-xts 512b 61.3 MiB/s 60.2 MiB/s
Seems like these results are on the same order of magnitude as my actual I/O tests. I guess because the nvme disk is fast enough, it's not a bottleneck.
What do you get on the raspi4 with 8GB RAM?
Also, monitoring bitcoind on ubuntu without LUKS with htop, it was doing about 90MB/s to 110MB/s data transfer with all the cores dedicated to bitcoind instead of 3 of the cores dedicated to LUKS (which is what start9 does).
Said another way, bitcoind seems to limit the transfer without LUKS and with LUKS, LUKS seems to limit the transfer rate and not bitcoind.
Comparing to an old Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 705636 iterations per second for 256-bit key
PBKDF2-sha256 910222 iterations per second for 256-bit key
PBKDF2-sha512 695342 iterations per second for 256-bit key
PBKDF2-ripemd160 456696 iterations per second for 256-bit key
PBKDF2-whirlpool 316981 iterations per second for 256-bit key
argon2i 4 iterations, 336139 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 340218 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 124.3 MiB/s 146.5 MiB/s
serpent-cbc 128b 50.0 MiB/s 190.9 MiB/s
twofish-cbc 128b 123.7 MiB/s 157.4 MiB/s
aes-cbc 256b 99.1 MiB/s 114.5 MiB/s
serpent-cbc 256b 50.0 MiB/s 192.2 MiB/s
twofish-cbc 256b 123.7 MiB/s 157.3 MiB/s
aes-xts 256b 147.1 MiB/s 148.0 MiB/s
serpent-xts 256b 180.1 MiB/s 183.8 MiB/s
twofish-xts 256b 153.6 MiB/s 155.7 MiB/s
aes-xts 512b 114.0 MiB/s 114.5 MiB/s
serpent-xts 512b 180.2 MiB/s 183.5 MiB/s
twofish-xts 512b 153.8 MiB/s 156.0 MiB/s
the modern raspi4 seems pretty slow.
That command does not seem to work on start9, but I've run it on ubuntu on the same hardware. Here is the result:
Make sure you use sudo
for that, so:
sudo cryptsetup benchmark
What do you get on the raspi4 with 8GB RAM?
Here is my RPi4 8GB (zram enabled + custom vm settings):
sudo cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 139290 iterations per second for 256-bit key
PBKDF2-sha256 228348 iterations per second for 256-bit key
PBKDF2-sha512 186181 iterations per second for 256-bit key
PBKDF2-ripemd160 119156 iterations per second for 256-bit key
PBKDF2-whirlpool 47627 iterations per second for 256-bit key
argon2i 4 iterations, 102904 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 116417 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 24.4 MiB/s 27.3 MiB/s
serpent-cbc 128b N/A N/A
twofish-cbc 128b 21.2 MiB/s 21.6 MiB/s
aes-cbc 256b 19.9 MiB/s 21.6 MiB/s
serpent-cbc 256b N/A N/A
twofish-cbc 256b 22.4 MiB/s 23.0 MiB/s
aes-xts 256b 32.7 MiB/s 32.7 MiB/s
serpent-xts 256b N/A N/A
twofish-xts 256b 24.2 MiB/s 24.3 MiB/s
aes-xts 512b 26.1 MiB/s 26.5 MiB/s
serpent-xts 512b N/A N/A
twofish-xts 512b 24.4 MiB/s 22.7 MiB/s
Here is my M1 qemu devBOX (8GB with 6cores) for comparison:
sudo cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1716163 iterations per second for 256-bit key
PBKDF2-sha256 3615779 iterations per second for 256-bit key
PBKDF2-sha512 1846084 iterations per second for 256-bit key
PBKDF2-ripemd160 723155 iterations per second for 256-bit key
PBKDF2-whirlpool 468114 iterations per second for 256-bit key
argon2i 4 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 730.4 MiB/s 3755.5 MiB/s
serpent-cbc 128b 56.1 MiB/s 60.8 MiB/s
twofish-cbc 128b 101.1 MiB/s 107.2 MiB/s
aes-cbc 256b 587.6 MiB/s 2289.6 MiB/s
serpent-cbc 256b 41.0 MiB/s 60.2 MiB/s
twofish-cbc 256b 104.1 MiB/s 134.4 MiB/s
aes-xts 256b 2019.4 MiB/s 2196.2 MiB/s
serpent-xts 256b 60.2 MiB/s 62.1 MiB/s
twofish-xts 256b 133.7 MiB/s 135.1 MiB/s
aes-xts 512b 2171.2 MiB/s 1957.6 MiB/s
serpent-xts 512b 60.4 MiB/s 63.2 MiB/s
twofish-xts 512b 133.0 MiB/s 129.4 MiB/s
You seem to be getting significantly better results for aes-xts on ubuntu. This is the same hardware as your RPi4 8GB, yes? What kernel is it running? What board revision is it? (cat /proc/cpuinfo
)
I don't have a raspi4 with 8GB RAM, I only have a raspi4 with 4GB ram.
Taking look at https://raspberrypi.stackexchange.com/questions/102064/performance-of-raspberry-pi-4-in-luks referenced above, it leads you to https://rr-developer.github.io/LUKS-on-Raspberry-Pi/ . I'll now repeat my above tests with that considred.
See also:
$ cat /etc/issue
Ubuntu 20.04.1 LTS \n \l
$
$ uname -a
Linux xxxxxxxxxxxxxxxxxxx 5.4.0-1045-raspi #49-Ubuntu SMP PREEMPT Wed Sep 29 17:49:16 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
$
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
Hardware : BCM2835
Revision : c03140
Serial : 10000000c2b9c5a8
Model : Raspberry Pi ? Rev 1.0
$
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 405168 iterations per second for 256-bit key
PBKDF2-sha256 649675 iterations per second for 256-bit key
PBKDF2-sha512 515017 iterations per second for 256-bit key
PBKDF2-ripemd160 335651 iterations per second for 256-bit key
PBKDF2-whirlpool 107612 iterations per second for 256-bit key
argon2i 4 iterations, 374573 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 376944 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 23.8 MiB/s 77.0 MiB/s
serpent-cbc 128b 35.6 MiB/s 36.2 MiB/s
twofish-cbc 128b 57.6 MiB/s 57.8 MiB/s
aes-cbc 256b 17.3 MiB/s 58.4 MiB/s
serpent-cbc 256b 35.7 MiB/s 36.2 MiB/s
twofish-cbc 256b 57.6 MiB/s 57.3 MiB/s
aes-xts 256b 84.8 MiB/s 74.8 MiB/s
serpent-xts 256b 37.4 MiB/s 37.4 MiB/s
twofish-xts 256b 62.0 MiB/s 60.7 MiB/s
aes-xts 512b 65.2 MiB/s 57.2 MiB/s
serpent-xts 512b 37.4 MiB/s 37.4 MiB/s
twofish-xts 512b 62.0 MiB/s 60.8 MiB/s
$
$ cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 143.8 MiB/s 144.5 MiB/s
$
$ for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha12,aes-adiantum 256b 171.6 MiB/s 172.5 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 141.8 MiB/s 142.2 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
aes-xts 256b 84.9 MiB/s 75.0 MiB/s
$
root@kinky-honor:/home/start9# cat /etc/issue
Debian GNU/Linux 11 \n \l
root@kinky-honor:/home/start9#
root@kinky-honor:/home/start9# uname -a
Linux kinky-honor 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux
root@kinky-honor:/home/start9#
root@kinky-honor:/home/start9# cat /proc/cpuinfo
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3
Hardware : BCM2835
Revision : c03140
Serial : 10000000782e0c04
Model : Raspberry Pi Compute Module 4 Rev 1.0
root@kinky-honor:/home/start9#
root@kinky-honor:/home/start9# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 143404 iterations per second for 256-bit key
PBKDF2-sha256 234057 iterations per second for 256-bit key
PBKDF2-sha512 189137 iterations per second for 256-bit key
PBKDF2-ripemd160 121362 iterations per second for 256-bit key
PBKDF2-whirlpool 38916 iterations per second for 256-bit key
argon2i 4 iterations, 146848 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 190143 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 33.9 MiB/s 35.9 MiB/s
serpent-cbc 128b N/A N/A
twofish-cbc 128b 25.4 MiB/s 25.5 MiB/s
aes-cbc 256b 26.8 MiB/s 27.9 MiB/s
serpent-cbc 256b N/A N/A
twofish-cbc 256b 25.1 MiB/s 23.0 MiB/s
aes-xts 256b 34.4 MiB/s 35.2 MiB/s
serpent-xts 256b N/A N/A
twofish-xts 256b 26.3 MiB/s 26.1 MiB/s
aes-xts 512b 28.2 MiB/s 28.9 MiB/s
serpent-xts 512b N/A N/A
twofish-xts 512b 26.4 MiB/s 26.0 MiB/s
root@kinky-honor:/home/start9#
root@kinky-honor:/home/start9# cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 57.1 MiB/s 58.0 MiB/s
root@kinky-honor:/home/start9#
root@kinky-honor:/home/start9# for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha12,aes-adiantum 256b 67.3 MiB/s 67.7 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 56.5 MiB/s 57.6 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
aes-xts 256b 36.2 MiB/s 37.2 MiB/s
root@kinky-honor:/home/start9#
$ cat /etc/issue
Ubuntu 20.04.6 LTS \n \l
$
$ uname -a
Linux xxxxxxxxxxxxxxxxxxx 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2412.673
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips : 5309.25
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU P8800 @ 2.66GHz
stepping : 10
microcode : 0xa0b
cpu MHz : 2469.047
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti tpr_shadow vnmi flexpriority dtherm
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips : 5309.25
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
$
$
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 712347 iterations per second for 256-bit key
PBKDF2-sha256 913393 iterations per second for 256-bit key
PBKDF2-sha512 696265 iterations per second for 256-bit key
PBKDF2-ripemd160 459901 iterations per second for 256-bit key
PBKDF2-whirlpool 317365 iterations per second for 256-bit key
argon2i 4 iterations, 332038 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 341568 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 123.2 MiB/s 145.4 MiB/s
serpent-cbc 128b 49.6 MiB/s 191.3 MiB/s
twofish-cbc 128b 123.1 MiB/s 156.3 MiB/s
aes-cbc 256b 99.6 MiB/s 114.5 MiB/s
serpent-cbc 256b 50.0 MiB/s 192.2 MiB/s
twofish-cbc 256b 123.0 MiB/s 155.8 MiB/s
aes-xts 256b 145.9 MiB/s 147.3 MiB/s
serpent-xts 256b 178.2 MiB/s 184.9 MiB/s
twofish-xts 256b 153.6 MiB/s 154.9 MiB/s
aes-xts 512b 113.6 MiB/s 110.8 MiB/s
serpent-xts 512b 176.0 MiB/s 186.2 MiB/s
twofish-xts 512b 153.7 MiB/s 156.0 MiB/s
$
$ cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 315.4 MiB/s 345.1 MiB/s
$
$ for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha12,aes-adiantum 256b 427.2 MiB/s 429.2 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 373.2 MiB/s 371.8 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
aes-xts 256b 146.4 MiB/s 147.7 MiB/s
$
$
$ cat /etc/issue
Ubuntu 22.04.2 LTS \n \l
$
$
$ uname -a
Linux xxxxxxxxxxxxxxxxxxx 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$
$
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 166
model name : Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
stepping : 0
microcode : 0xf4
cpu MHz : 500.044
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit mmio_stale_data retbleed eibrs_pbrsb
bogomips : 3199.92
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
~~~~~~ truncated the remaining 11 processors ~~~~~~~~
$
$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1444319 iterations per second for 256-bit key
PBKDF2-sha256 1985939 iterations per second for 256-bit key
PBKDF2-sha512 1438375 iterations per second for 256-bit key
PBKDF2-ripemd160 834853 iterations per second for 256-bit key
PBKDF2-whirlpool 655360 iterations per second for 256-bit key
argon2i 6 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 6 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 1194.2 MiB/s 3584.3 MiB/s
serpent-cbc 128b 102.5 MiB/s 782.0 MiB/s
twofish-cbc 128b 233.2 MiB/s 417.5 MiB/s
aes-cbc 256b 909.6 MiB/s 2885.8 MiB/s
serpent-cbc 256b 101.9 MiB/s 787.1 MiB/s
twofish-cbc 256b 232.4 MiB/s 417.9 MiB/s
aes-xts 256b 3567.4 MiB/s 3576.0 MiB/s
serpent-xts 256b 672.8 MiB/s 678.0 MiB/s
twofish-xts 256b 390.6 MiB/s 389.0 MiB/s
aes-xts 512b 2885.3 MiB/s 2890.4 MiB/s
serpent-xts 512b 675.2 MiB/s 680.0 MiB/s
twofish-xts 512b 387.1 MiB/s 386.7 MiB/s
$
$
$ cryptsetup benchmark -c xchacha20,aes-adiantum-plain64
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 1522.8 MiB/s 1542.1 MiB/s
$
$
$ for cipher in "xchacha12,aes-adiantum-plain64" "xchacha20,aes-adiantum-plain64" "aes-xts-plain64"; do cryptsetup benchmark --cipher="${cipher}"; done
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha12,aes-adiantum 256b 1822.4 MiB/s 1883.8 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
xchacha20,aes-adiantum 256b 1575.4 MiB/s 1574.7 MiB/s
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
aes-xts 256b 3516.2 MiB/s 3511.7 MiB/s
$
It seems like start9 is performing much worse than ubuntu on the same raspi4 hardware and xchacha12,aes-adiantum-plain64
seems to be the fastest algorithm I've found for the raspi4 hardware (but I have no idea of the security tradeoffs for it).
Okay, so I am approaching 3 weeks now and I'm only at block 568,790 .
Wondering, can I run cryptsetup-reencrypt --decrypt
and decrypt it for now? Or, can I run cryptsetup-reencrypt -c xchacha12,aes-adiantum
(that looked like the fastest for the raspi based on my tests shown above)? It's possible it might make sense for me to just completely erase the disk and do that after first boot. However, I'm not sure of the exact way to execute the command because your operating system has such a special setup.
Also, I'd consider removing the raspi download links for any version that has this slow encryption enabled because it's just not usable.
@AndySchroder because I don't have access to 4GB RPi4 please try tweaking your memory management subsystem a bit and see if it will improve the IBD.
/etc/sysctl.conf
and add
vm.vfs_cache_pressure=432
vm.swappiness=96
vm.dirty_background_ratio=1
vm.dirty_ratio=42
sudo sysctl --system
Let it sit for some time and observe if it made any difference.
We are pretty confident this is a result of disk caching. We are preparing to release a version of StartOS that disabled LUKS for Raspberry Pi. Hopefully that will take care of it.
Okay, I got busy and also had the system powered down for a while. I've just made the changes you requested. Will these changes persist if I reboot? Also, I'm a bit confused how this is going to help because I thought zram used more CPU to compress the data in RAM?
We are at block 588,913
.
Will report back in a week on the block height.
Okay, I got busy and also had the system powered down for a while. I've just made the changes you requested. Will these changes persist if I reboot? Also, I'm a bit confused how this is going to help because I thought zram used more CPU to compress the data in RAM?
By increasing vm.vfs_cache_pressure and vm.swappiness, the kernel is more aggressive in freeing cache memory and using swap space, respectively, aiding performance during blockchain synchronization on low-memory systems like Raspberry Pi 4 with 4GB RAM.
Lowering vm.dirty_background_ratio and vm.dirty_ratio allows more prompt writing of dirty pages to disk, preventing excessive memory usage during synchronization and ensuring smoother performance. Additionally, enabling Zram increases virtual memory, further improving resource efficiency.
Those changes will revert back to system defaults if you restart.
Okay, it has been over a week and I am now at block 634,370. I don't think your suggestion helped.
Have you looked at my test results for cryptsetup benchmark
? It seems as though much could be learned from that.
In the mean time, have you released the version for the raspi without LUKS?
We have tested without LUKS and unfortunately saw no improvement in performance. You can test this yourself on 0.3.4.4 by setting disable-encryption: true
in /etc/embassy/config.yaml
before fresh setup. Make sure not to reboot after setting that option before you set up because that file is in the overlay and changes will not persist across reboots.
FYI startd needs to be restarted after the config file is updated in order for the config changes to be picked up
You might want to check the clock speed and governor used.
I followed the DIY guide, installed start9 on a RPI-4, and noticed that it was always running on 600 mhz. The governor of the CPUs were all set to 'powersafe'.
on RPI-OS, this is changed on boot by the "raspi-config" systemd script to 'ondemand'.
This is missing from start9, so it was stuck on 600mhz, even under load.
I fixed it by going into the chroot environment, installed raspi-config trough apt, and exit/reboot.
Prerequisites
Server Hardware
raspi CM4, 4GB RAM, 1TB nvme SSD
StartOS Version
0.3.4.3
Client OS
Linux
Client OS Version
firefox
Browser
Firefox
Browser Version
n/a
Current Behavior
Very slow IBD. Been running for several weeks and only at block 512895.
Two issues I've noticed so far:
Other notes:
Expected Behavior
On this hardware I'd expect IBD to complete in about 5 days (tested on the same hardware with a manual install of bitcoind on ubuntu and also with Umbrel).
Steps to Reproduce
Anything else?
No response