MariaDB / mariadb-docker

Docker Official Image packaging for MariaDB
https://mariadb.org
GNU General Public License v2.0
784 stars 442 forks source link

mysqld failed while attempting to check config on arm 64 #338

Closed jmburges closed 2 years ago

jmburges commented 3 years ago

Hello,

I ran sudo docker run --name some-mariadb -e MYSQL_ROOT_PASSWORD=my-secret-pw arm64v8/mariadb:latest to test out mariadb on my ODroid-C2 and receieved the following error:

2020-12-28 01:47:19+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 1:10.5.8+maria~focal started.
2020-12-28 01:47:20+00:00 [ERROR] [Entrypoint]: mysqld failed while attempting to check config
        command was: mysqld --verbose --help --log-bin-index=/tmp/tmp.55u0o1Cg8L

It's under sudo so I don't think there should be any permissions problems. Any ideas?

tianon commented 3 years ago

Interesting, sounds like the mysqld binary is failing for some reason -- can you try something like the following to try to help narrow this down?

$ docker pull arm64v8/mariadb:latest
$ docker run -it --rm --entrypoint bash arm64v8/mariadb:latest
root@b7403e3d732e:/# mysqld --version
root@b7403e3d732e:/# mysqld --verbose --help > /dev/null
jmburges commented 3 years ago

Thanks for the fast response! Here is the output. Looks like it's not starting at all. weird

root@30c94cb0cb16:/# mysqld --version
Illegal instruction (core dumped)
root@30c94cb0cb16:/# mysqld --verbose --help > /dev/null
Illegal instruction (core dumped)
tianon commented 3 years ago

Bizarre - sounds like something is either wrong with the mysqld binary published by MariaDB or the ODroid-C2 chip/kernel. :confused:

grooverdan commented 3 years ago

Can you include some hardware information LD_SHOW_AUXV=1 /bin/true? and lscpu?

If you look at dmesg output what address did this occur at? Is this for the mariadb-10.5.8 version?

Writing an upstream bug report on https://jira.mariadb.org would be appreciated.

grooverdan commented 3 years ago

Note https://jira.mariadb.org/browse/MDEV-23495 there was a bug in 10.5 that is fixed in 10.5.7 onward. If your version is in this range try an update.

fauust commented 3 years ago

Hi, I am not able to reproduce this on:

@jmburges, here is for comparison the information requested by @grooverdan.

@jmburges you could also try to build the container and use it directly. Something like this should do it:

git clone https://github.com/docker-library/mariadb && cd mariadb/10.5
docker build . -t mariadb-test
docker run -it -e MYSQL_ROOT_PASSWORD=my-secret-pw mariadb-test:latest
grooverdan commented 3 years ago

@jmburges. We'd really like to solve this on your hardware. Can you please include the hardware information?

Also can you in a container:

# add-apt-repository 'deb [arch=amd64,arm64,ppc64el] https://download.nus.edu.sg/mirror/mariadb/repo/10.5/ubuntu focal main/debug'
# apt-get update -y
# apt-get install mariadb-server-10.5-dbg-sym
# gdb --args mysqld --verbose
gdb> bt full

This will help us find where in the codebase the problem could be.

grooverdan commented 3 years ago

Cannot proceed without information.

stoinov commented 3 years ago

I can confirm the same issue with a Odroid-C2 Here are my hardware specs:

❯ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0x7f911f0000
AT_HWCAP:        83
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x5581496040
AT_PHENT:        56
AT_PHNUM:        8
AT_BASE:         0x7f911c7000
AT_FLAGS:        0x0
AT_ENTRY:        0x5581497a2c
AT_UID:          0
AT_EUID:         0
AT_GID:          0
AT_EGID:         0
AT_SECURE:       0
AT_RANDOM:       0x7fe3652678
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64

❯ lscpu
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
Model:                 4
CPU max MHz:           1536.0000
CPU min MHz:           100.0000
BogoMIPS:              2.00
Flags:                 fp asimd crc32

I also tried building from source and got the same error when running the custom image.

In order to run add-apt-repository on a latest ubuntu image I followed this article. Unfortunately I got error when ran apt-get install mariadb-server-10.5-dbg-sym: E: Unable to locate package mariadb-server-10.5-dbg-sym

grooverdan commented 3 years ago

@stoinov I've created some prebuilt images with debug symbols installed - https://quay.io/repository/mariadb-foundation/mariadb-debug?tab=tags

Use quay.io/mariadb-foundation/mariadb-debug:10.5 example as the image name.

Can you try one of those?

stoinov commented 3 years ago

I've used the image you provided and I still get the same error:

[ERROR] [Entrypoint]: mysqld failed while attempting to check config
        command was: mysqld --verbose --help --log-bin-index=/tmp/tmp.vJJK0kpbmZ

When I enter the container with docker run --name test -it --entrypoint /bin/bash quay.io/mariadb-foundation/mariadb-debug:10.5 I am still getting E: Unable to locate package mariadb-server-10.5-dbg-sym gdb --args mysqld --verbose command is not found.

grooverdan commented 3 years ago

Debug symbols are already installed:

$ podman  run --name test -it --entrypoint /bin/bash --rm  quay.io/mariadb-foundation/mariadb-debug:10.5
root@9b22202a5296:/# dpkg -l | grep sym
ii  libmariadb3-dbgsym:amd64        1:10.5.13+maria~focal             amd64        debug symbols for libmariadb3
ii  libnettle7:amd64                3.5.1+really3.5.1-2ubuntu0.2      amd64        low level cryptographic library (symmetric and one-way cryptos)
ii  mariadb-backup-dbgsym           1:10.5.13+maria~focal             amd64        debug symbols for mariadb-backup
ii  mariadb-client-10.5-dbgsym      1:10.5.13+maria~focal             amd64        debug symbols for mariadb-client-10.5
ii  mariadb-client-core-10.5-dbgsym 1:10.5.13+maria~focal             amd64        debug symbols for mariadb-client-core-10.5
ii  mariadb-server-10.5-dbgsym      1:10.5.13+maria~focal             amd64        debug symbols for mariadb-server-10.5
ii  mariadb-server-core-10.5-dbgsym 1:10.5.13+maria~focal             amd64        debug symbols for mariadb-server-core-10.5

So first option, see if running without specifying an entrypoint gives a decent backtrace.

Option 2:

$ podman  run --name test -it  --rm  -p 2345:2345 --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined  quay.io/mariadb-foundation/mariadb-debug:10.5 gosu mysql gdbserver :2345 mariadbd
Process mariadbd created; pid = 11
Listening on port 2345

And use gdb compiled with this patch https://sourceware.org/pipermail/gdb-patches/2021-August/181718.html to get the debug symbols to connect - https://jira.mariadb.org/browse/MDEV-26727

Option 3:

FROM  quay.io/mariadb-foundation/mariadb-debug:10.5
RUN apt-get update && apt-get install -y gdb

The after building this use docker run (as above) new_image gosu mysql gdb --args mariadbd

Both above 2 options assume either a) it crashes early without a datadir. If it doesn't provide /var/lib/mysql as a volume containing a created datadir (even if copied from elsewhere).

grooverdan commented 3 years ago

option 2,3; while gdb is waiting to run the server, you can docker exec {container} sh -c "chown -R mysql: && mariadb-install -u mysql" to install the instance (assuming it doesn't crash during the install, which for an architecture bug seems likely it would).

stoinov commented 3 years ago
  1. Running the container gives the initial one line error.
  2. Was not sure how to do it
  3. Built an image and started it as shown. Got this prompt waiting for input:
    Reading symbols from mariadbd...
    Reading symbols from /usr/lib/debug/.build-id/27/53d3e03d00ed21daecfe736a4aaf2689218226.debug...
    (gdb)

    I tried running the chown command inside the docker but got: chown: missing operand after 'mysql:'

Running just docker exec test sh -c "mariadb-install -u mysql" returned mariadb-install: not found

grooverdan commented 3 years ago

Sorry I was assuming too much. Let me do a more complete of a modified option 3 example without typos:

Running (with podman, docker substitute should be equivalent). Here we install gdb on the prompt.

$    podman run -ti  --cap-add CAP_SYS_PTRACE --security-opt seccomp=unconfined  quay.io/mariadb-foundation/mariadb-debug:10.5 bash
root@fe15c31b5a31:/# apt-get update && apt-get install -y gdb
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] 
...

Change permissions on volume:

root@fe15c31b5a31:/# chown -R mysql: /var/lib/mysql

Install basic datadir:

root@fe15c31b5a31:/# gosu mysql mariadb-install-db
Installing MariaDB/MySQL system tables in '/var/lib/mysql' ...
OK

This may crash for you, if so just ignore it an go on.

Start mariadbd under gdb:

root@fe15c31b5a31:/# gosu mysql gdb --args  mariadbd
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mariadbd...
Reading symbols from /usr/lib/debug/.build-id/6e/0a874dca5a7ff831396ddc0785d939a192efe3.debug...
(gdb) 

On the gdb prompt, set a few options, and then just r return to run.

(gdb) set pagination off
(gdb) set print frame-arguments all
(gdb) r
Starting program: /usr/sbin/mariadbd 
[Thread debugging using libthread_db enabled]
...
2021-10-07 22:48:08 0 [Note] Reading of all Master_info entries succeeded
2021-10-07 22:48:08 0 [Note] Added new Master_info '' to hash table
2021-10-07 22:48:08 0 [Note] /usr/sbin/mariadbd: ready for connections.
Version: '10.5.13-MariaDB-1:10.5.13+maria~focal' as '10.5.13-MariaDB-4eb7217ec33fef8d23f2dda0c97b442508c81b1d'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

I'm assuming its crashing at startup, but if you need to apply a workload, do this now.

After it gets SIGILL or some stopping signal, enter thread apply all bt full:

Thread 1 "mariadbd" received signal SIGILL, Illegal instruction.
0x00007ffff7664aff in __GI___poll (fds=fds@entry=0x7fffffffe2a0, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) thread apply all bt full

The capture that output. And attach it to a new https://jira.mariadb.org/ ticket.

stoinov commented 3 years ago

Thanks for the detailed breakdown, and the install basic datadir step crashes as predicted:

Error log ``` root@8c72417e2d36:/# gosu mysql mariadb-install-db Installing MariaDB/MySQL system tables in '/var/lib/mysql' ... Illegal instruction Installation of system tables failed! Examine the logs in /var/lib/mysql for more information. The problem could be conflicting information in an external my.cnf files. You can ignore these by doing: shell> /usr/bin/mariadb-install-db --defaults-file=~/.my.cnf You can also try to start the mysqld daemon with: shell> /usr/sbin/mysqld --skip-grant-tables --general-log & and use the command line tool /usr/bin/mysql to connect to the mysql database and look at the grant tables: shell> /usr/bin/mysql -u root mysql mysql> show tables; Try 'mysqld --help' if you have problems with paths. Using --general-log gives you a log in /var/lib/mysql that may be helpful. The latest information about mysql_install_db is available at https://mariadb.com/kb/en/installing-system-tables-mysql_install_db You can find the latest source at https://downloads.mariadb.org and the maria-discuss email list at https://launchpad.net/~maria-discuss Please check all of the above before submitting a bug report at https://mariadb.org/jira ```

Side note - this exact error message I also get from the linuxserver image. I have a separate issue with them about it. Just as there, there are no logs in the mentioned /var/lib/mysql folder. Not sure if both are related but it seems like something worth checking too.

Continuing with your steps.

gdb log: ``` Reading symbols from mariadbd... Reading symbols from /usr/lib/debug/.build-id/27/53d3e03d00ed21daecfe736a4aaf2689218226.debug... (gdb) set pagination off (gdb) set print frame-arguments all (gdb) r Starting program: /usr/sbin/mariadbd [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. 0x0000007fb7b253a8 in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.1 (gdb) thread apply all bt full Thread 1 (Thread 0x7fb7fef870 (LWP 444)): #0 0x0000007fb7b253a8 in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.1 No symbol table info available. #1 0x0000007fb7d065f0 in ?? () from /lib/aarch64-linux-gnu/libcrypto.so.1.1 No symbol table info available. Backtrace stopped: previous frame inner to this frame (corrupt stack?) ```

This seems rather underwhelming so let me know if it's enough for a jira ticket or we can expand on it somehow to get more data.

grooverdan commented 3 years ago

Thanks @stoinov . From the Above we see that the SIGILL is in /lib/aarch64-linux-gnu/libcrypto.so.1.1. This the the openssl crypto library from Ubuntu (20.04) that we base our image on.

From https://github.com/openssl/openssl/issues/14897 it looks like openssl uses SIGILL to determine features available.

When you get to this step in gdb, enter c to continue onf the gdb prompt. The next fault (I'm hoping) is the one I'm more interested in. This will reference some MariaDB code.

Additionally include the debug symbols from libssl1.1 (note command is two lines)

root@627b02963823:/# echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse" |  tee  -a /etc/apt/sources.list.d/ddebs.list
root@627b02963823:/# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F2EDC64DC5AEE1F6B9C621F0C8CAB6595FDFF622
root@627b02963823:/# apt-get update && apt-get install -y  libssl1.1-dbgsym

Looks like a bt full will be sufficient for a bug report rather than for all threads.

stoinov commented 3 years ago

So I redid your previous instruction while adding the libssl1.1 step before running mariadbd. After executing the gbd commands, here is what I get now:

MariDB log ``` Starting program: /usr/sbin/mariadbd [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. _armv7_tick () at crypto/arm64cpuid.S:20 20 crypto/arm64cpuid.S: No such file or directory. (gdb) thread apply all bt full Thread 1 (Thread 0x7fb7fef870 (LWP 1099)): #0 _armv7_tick () at crypto/arm64cpuid.S:20 No locals. #1 0x0000007fb7b211ac in OPENSSL_cpuid_setup () at ../crypto/armcap.c:207 e = ill_oact = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 20, 0, 1, 0, 545460846593, 0, 545460846593, 548547814656, 549755810976, 548542746740, 548544734704, 548544734704, 1, 549755811624, 549755811640}}, sa_flags = 0, sa_restorer = 0x0} ill_act = {__sigaction_handler = {sa_handler = 0x7fb7b254a8 , sa_sigaction = 0x7fb7b254a8 }, sa_mask = {__val = {18446744067267099431, 18446744073709551615 }}, sa_flags = 0, sa_restorer = 0x0} oset = {__val = {0, 548539255392, 4294967295, 548547788656, 4294967295, 548547814656, 4294967295, 548547801088, 548537830448, 548547801088, 548547788960, 549755811424, 548547727316, 549755811440, 548547727316, 549755811456}} trigger = 1 #2 0x0000007fb7fdb83c in ?? () from /lib/ld-linux-aarch64.so.1 No symbol table info available. #3 0x0000007fb7fdb93c in ?? () from /lib/ld-linux-aarch64.so.1 No symbol table info available. #4 0x0000007fb7fce144 in ?? () from /lib/ld-linux-aarch64.so.1 No symbol table info available. Backtrace stopped: not enough registers or memory available to unwind further (gdb) c Continuing. Program received signal SIGILL, Illegal instruction. my_timer_init (mti=0x5556c2c260 ) at ./mysys/my_rdtsc.c:393 393 ./mysys/my_rdtsc.c: No such file or directory. ```

is this expected? Or should we change something?

grooverdan commented 3 years ago

That's good. With this we can focus on how my_timer_init is implemented on ARM64 and how that maps to the capabilities of Odroid-C2. It may not be the last one, but its a good start.

stoinov commented 3 years ago

Just a side note that I'm using kernel 3.16.85+ which might be of interest in troubleshooting this. Not sure if I can update to a newer one on my current distro - DietPi.

grooverdan commented 3 years ago

I think its going to be highly unlikely that its a kernel. Problem: The line 393 corresponds to the my_timer_cycles inline. This was changed in https://github.com/MariaDB/server/commit/c76b45a5242f50d00c16f0bbf1dbecd4e359e02c to use the CNTVCT_EL0 register which I suspect isn't on the A53. Further documentation checks welcome.

If you could test a base docker.io/library/mariadb-10.5.4 to see if this starts, no gdb required, that would confirm its this code. Alternately test the sample test code in https://jira.mariadb.org/browse/MDEV-23249?focusedCommentId=160673&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-160673

stoinov commented 3 years ago

great catch! 10.5.4 does indeed startup and creates files in the folder, but now I have different error:

mysql_error.log: ``` 211010 22:32:06 [ERROR] mysqld got signal 4 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.4-MariaDB-1:10.5.4+maria~focal key_buffer_size=33554432 read_buffer_size=3145728 max_used_connections=0 max_threads=12 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 119059 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x30000 /usr/sbin/mysqld(my_print_stacktrace+0x30)[0x55730b1310] Printing to addr2line failed /usr/sbin/mysqld(handle_fatal_signal+0x45c)[0x5572b5c9bc] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7fb6000510] /usr/sbin/mysqld(crc32c_aarch64+0x4b4)[0x55730c8adc] /usr/sbin/mysqld(+0xcf68ec)[0x5572f908ec] /usr/sbin/mysqld(+0xcf7274)[0x5572f91274] /usr/sbin/mysqld(+0xcf8fa8)[0x5572f92fa8] /usr/sbin/mysqld(+0xcf9f50)[0x5572f93f50] /usr/sbin/mysqld(+0xcfb198)[0x5572f95198] /usr/sbin/mysqld(+0x6090f0)[0x55728a30f0] /usr/sbin/mysqld(+0xb67cc8)[0x5572e01cc8] /usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x6c)[0x5572b5f7f4] /usr/sbin/mysqld(+0x7057dc)[0x557299f7dc] /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x864)[0x55729a08b4] /usr/sbin/mysqld(+0x63f270)[0x55728d9270] /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x40c)[0x55728deadc] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8)[0x7fb568f090] /usr/sbin/mysqld(+0x639d90)[0x55728d3d90] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /var/lib/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes unlimited unlimited processes Max open files 1048576 1048576 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 7875 7875 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core 2021-10-10 22:32:11 0 [ERROR] InnoDB: Corrupted page [page id: space=0, page number=0] of datafile './ibdata1' could not be found in the doublewrite buffer. 2021-10-10 22:32:11 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption 2021-10-10 22:32:11 0 [ERROR] Plugin 'InnoDB' init function returned error. 2021-10-10 22:32:11 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2021-10-10 22:32:11 0 [ERROR] Unknown/unsupported storage engine: InnoDB 2021-10-10 22:32:11 0 [ERROR] Aborting ```

then it continues repeating it.

I found that this has already been solved in 10.5.5, which is very unfortunate 😄

I tried running with 10.4.21 but I get:

2021-10-10 22:44:12+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.4.21+maria~focal started.
2021-10-10 22:44:12+00:00 [ERROR] [Entrypoint]: mysqld failed while attempting to check config
    command was: mysqld --verbose --help --log-bin-index=/tmp/tmp.0RiFQm5vEb
grooverdan commented 3 years ago

10.4.14+ include the same patch. We can't revert MDEV-23249 as it still would be a bug, so what other counter registers are available?

FYI @tsahee, @mysqlonarm, @dr-m

stoinov commented 3 years ago

I can confirm that 10.4.13 starts correctly: 2021-10-11 0:00:14 0 [Note] mysqld: ready for connections. Version: '10.4.13-MariaDB-1:10.4.13+maria~focal' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution

mysqlonarm commented 3 years ago

@stoinov given the different hardware can you try the the following code/example on it. https://jira.mariadb.org/browse/MDEV-23249?focusedCommentId=160673&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-160673

tsahee commented 3 years ago

CNTVCT_EL0 register should be available in A53, and on any processor implementing armv8-a:

https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/CNTVCT-EL0--Counter-timer-Virtual-Count-register?lang=en

A kernel error in 3.16.85 sounds quite reasonable (trapping access to that register and handling it wrong). Tagging @geoffreyblake

stoinov commented 3 years ago

@mysqlonarm I tried installing clang++ on my device OS (not container) and I only could've found clang 11.0. Trying to compile using clang -std=c++11 -stdlib=libc++ timer.cc returned timer.cc:1:10: fatal error: 'iostream' file not found which seems to be issue of using c instead c++.

I haven't done any compiling before so excuse me if this is something obvious.

grooverdan commented 3 years ago

@tsahee , @geoffreyblake, any comments on if this is a kernel error? Any workarounds?

geoffreyblake commented 2 years ago

@stoinov @grooverdan , the compiler error above is likely from not having clang installed properly. As for a workaround to the CNTVCT_EL0 register, the best I can supply is writing a small kernel module to check the contents of CNTKCTL_EL1 and look to see if bit 1 is set to 1, and if not, setting it to 1 on all cores. https://developer.arm.com/documentation/ddi0595/2021-03/AArch64-Registers/CNTKCTL-EL1--Counter-timer-Kernel-Control-register

If that bit is set to 0, then access to CNTVCT_EL0 from user space traps into the kernel.

grooverdan commented 2 years ago

Makes sense. @geoffreyblake given the DietPi seems to use the HardKernel fork for its odroidc2 should that be the place to patch?

Is setting to 1 here absurdly crude?

geoffreyblake commented 2 years ago

The link above is looking at code for the KVM hypervisor, touching code there will not have any impact on the host OS itself.

You can write a small driver like below to print out the value of the reg by simply insmod'ing it, just have the kernel headers on hand:

#include <asm/io.h>
#include <linux/module.h>

void test_each(void *info)
{
  u64 cntkctl_el1;
  u64 cpu = smp_processor_id();
  asm volatile("mrs %0,  s3_0_c14_c1_0" : "=r" (actlr2));
  printk("%lld: cntkclt_el1=%#llx\n", cpu, cntkctl_el1);
}

int __init start(void)
{
  on_each_cpu(test_each, NULL, 1);

  return 0;
}

void __exit end(void)
{}

module_init(start);
module_exit(end);
MODULE_LICENSE("GPL v2");

Sample Makefile:

obj-m += print-cntkctl_el1.o

BUILD_KERNEL ?= $(shell uname -r)

all:
    make -C /lib/modules/$(BUILD_KERNEL)/build M=$(CURDIR) modules

clean:
    make -C /lib/modules/$(BUILD_KERNEL)/build M=$(CURDIR) clean
grooverdan commented 2 years ago

@stoinov are you ok to build and load the module code that @geoffreyblake (many thanks) has provided?

stoinov commented 2 years ago

Sure. I just need detailed instructions how to do this as I am not familiar with Linux to do it on my own.

grooverdan commented 2 years ago

Kernel docs.

You should have a gcc compiler and make installed.

In a new directory, put the Makefile as Makefile, and the C source code as print-cntkctl_el1.c.

Change Download the kernel source. I don't know if dietpi has a package for source package, but failing that:

git clone --single-branch --branch odroidc2-v3.16.y --depth 10  https://github.com/hardkernel/linux.git

If the path exists "/lib/modules/uname -r"

  1. ensure "/lib/modules/uname -r/build" is a symlink to your source
  2. from the module directory "make -C /lib/modules/uname -r/build M=$PWD"

otherwise:

  1. cd linux
  2. make modules_prepare
  3. In your module directory: make -C /path/to/linux M=$PWD

From your module directory:

make -C {same as before} modules_install
modprobe print-cntkctl_el1

This should have the module loaded. Look at dmesg output to see the printed output.

stoinov commented 2 years ago

so I did apt install make, gcc with the latest versions available. I do have /lib/modules/3.16.85+/ so I cloned the provided repo with the specified branch and then ln -s /root/maria/linux/ /lib/modules/3.16.85+/build Then in the maria folder I tried "make -C /lib/modules/uname -r/build M=$PWD" and I got this error:

make: Entering directory '/root/maria/linux'

  ERROR: Kernel configuration is invalid.
         include/generated/autoconf.h or include/config/auto.conf are missing.
         Run 'make oldconfig && make prepare' on kernel src to fix it.

  WARNING: Symbol version dump ./Module.symvers
           is missing; modules will have no dependencies and modversions.

  CC [M]  /root/maria/print-cntkctl_el1.o
In file included from <command-line>:
././include/linux/kconfig.h:4:10: fatal error: generated/autoconf.h: No such file or directory
    4 | #include <generated/autoconf.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [scripts/Makefile.build:264: /root/maria/print-cntkctl_el1.o] Error 1
make: *** [Makefile:1363: _module_/root/maria] Error 2
make: Leaving directory '/root/maria/linux'

reading up on this error, I saw a lot of variability in fixes, the most obvious being apt install --reinstall linux-headers-$(uname -r). After running again I got:

make: Entering directory '/usr/src/linux-headers-3.16.85+'
  CC [M]  /root/maria/print-cntkctl_el1.o
/root/maria/print-cntkctl_el1.c: In function ‘test_each’:
/root/maria/print-cntkctl_el1.c:8:49: error: ‘actlr2’ undeclared (first use in this function)
    8 |   asm volatile("mrs %0,  s3_0_c14_c1_0" : "=r" (actlr2));
      |                                                 ^~~~~~
/root/maria/print-cntkctl_el1.c:8:49: note: each undeclared identifier is reported only once for each function it appears in
In file included from include/linux/printk.h:5,
                 from include/linux/kernel.h:13,
                 from include/asm-generic/bug.h:13,
                 from arch/arm64/include/generated/asm/bug.h:1,
                 from include/linux/bug.h:4,
                 from include/linux/thread_info.h:11,
                 from include/asm-generic/preempt.h:4,
                 from arch/arm64/include/generated/asm/preempt.h:1,
                 from include/linux/preempt.h:18,
                 from include/linux/spinlock.h:50,
                 from include/linux/mm_types.h:8,
                 from include/asm-generic/pgtable.h:7,
                 from ./arch/arm64/include/asm/pgtable.h:429,
                 from ./arch/arm64/include/asm/io.h:29,
                 from /root/maria/print-cntkctl_el1.c:1:
/root/maria/print-cntkctl_el1.c: At top level:
include/linux/init.h:337:6: warning: ‘init_module’ specifies less restrictive attribute than its target ‘start’: ‘cold’ [-Wmissing-attributes]
  337 |  int init_module(void) __attribute__((alias(#initfn)));
      |      ^~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:23:1: note: in expansion of macro ‘module_init’
   23 | module_init(start);
      | ^~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:12:12: note: ‘init_module’ target declared here
   12 | int __init start(void)
      |            ^~~~~
In file included from include/linux/printk.h:5,
                 from include/linux/kernel.h:13,
                 from include/asm-generic/bug.h:13,
                 from arch/arm64/include/generated/asm/bug.h:1,
                 from include/linux/bug.h:4,
                 from include/linux/thread_info.h:11,
                 from include/asm-generic/preempt.h:4,
                 from arch/arm64/include/generated/asm/preempt.h:1,
                 from include/linux/preempt.h:18,
                 from include/linux/spinlock.h:50,
                 from include/linux/mm_types.h:8,
                 from include/asm-generic/pgtable.h:7,
                 from ./arch/arm64/include/asm/pgtable.h:429,
                 from ./arch/arm64/include/asm/io.h:29,
                 from /root/maria/print-cntkctl_el1.c:1:
include/linux/init.h:343:7: warning: ‘cleanup_module’ specifies less restrictive attribute than its target ‘end’: ‘cold’ [-Wmissing-attributes]
  343 |  void cleanup_module(void) __attribute__((alias(#exitfn)));
      |       ^~~~~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:24:1: note: in expansion of macro ‘module_exit’
   24 | module_exit(end);
      | ^~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:20:13: note: ‘cleanup_module’ target declared here
   20 | void __exit end(void)
      |             ^~~
/root/maria/print-cntkctl_el1.c: In function ‘test_each’:
/root/maria/print-cntkctl_el1.c:8:3: error: invalid lvalue in ‘asm’ output 0
    8 |   asm volatile("mrs %0,  s3_0_c14_c1_0" : "=r" (actlr2));
      |   ^~~
make[1]: *** [scripts/Makefile.build:264: /root/maria/print-cntkctl_el1.o] Error 1
make: *** [Makefile:1363: _module_/root/maria] Error 2
make: Leaving directory '/usr/src/linux-headers-3.16.85+'
grooverdan commented 2 years ago

This looks like a @geoffreyblake typo. Replace actlr2 with cntkctl_el1 in the code.

stoinov commented 2 years ago

after the fix I got success:

make: Entering directory '/usr/src/linux-headers-3.16.85+'
  CC [M]  /root/maria/print-cntkctl_el1.o
In file included from include/linux/printk.h:5,
                 from include/linux/kernel.h:13,
                 from include/asm-generic/bug.h:13,
                 from arch/arm64/include/generated/asm/bug.h:1,
                 from include/linux/bug.h:4,
                 from include/linux/thread_info.h:11,
                 from include/asm-generic/preempt.h:4,
                 from arch/arm64/include/generated/asm/preempt.h:1,
                 from include/linux/preempt.h:18,
                 from include/linux/spinlock.h:50,
                 from include/linux/mm_types.h:8,
                 from include/asm-generic/pgtable.h:7,
                 from ./arch/arm64/include/asm/pgtable.h:429,
                 from ./arch/arm64/include/asm/io.h:29,
                 from /root/maria/print-cntkctl_el1.c:1:
include/linux/init.h:337:6: warning: ‘init_module’ specifies less restrictive attribute than its target ‘start’: ‘cold’ [-Wmissing-attributes]
  337 |  int init_module(void) __attribute__((alias(#initfn)));
      |      ^~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:23:1: note: in expansion of macro ‘module_init’
   23 | module_init(start);
      | ^~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:12:12: note: ‘init_module’ target declared here
   12 | int __init start(void)
      |            ^~~~~
In file included from include/linux/printk.h:5,
                 from include/linux/kernel.h:13,
                 from include/asm-generic/bug.h:13,
                 from arch/arm64/include/generated/asm/bug.h:1,
                 from include/linux/bug.h:4,
                 from include/linux/thread_info.h:11,
                 from include/asm-generic/preempt.h:4,
                 from arch/arm64/include/generated/asm/preempt.h:1,
                 from include/linux/preempt.h:18,
                 from include/linux/spinlock.h:50,
                 from include/linux/mm_types.h:8,
                 from include/asm-generic/pgtable.h:7,
                 from ./arch/arm64/include/asm/pgtable.h:429,
                 from ./arch/arm64/include/asm/io.h:29,
                 from /root/maria/print-cntkctl_el1.c:1:
include/linux/init.h:343:7: warning: ‘cleanup_module’ specifies less restrictive attribute than its target ‘end’: ‘cold’ [-Wmissing-attributes]
  343 |  void cleanup_module(void) __attribute__((alias(#exitfn)));
      |       ^~~~~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:24:1: note: in expansion of macro ‘module_exit’
   24 | module_exit(end);
      | ^~~~~~~~~~~
/root/maria/print-cntkctl_el1.c:20:13: note: ‘cleanup_module’ target declared here
   20 | void __exit end(void)
      |             ^~~
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /root/maria/print-cntkctl_el1.mod.o
  LD [M]  /root/maria/print-cntkctl_el1.ko
make: Leaving directory '/usr/src/linux-headers-3.16.85+'

On the next step I got an error tho:

make -C /lib/modules/3.16.85+/build modules_install
make: Entering directory '/usr/src/linux-headers-3.16.85+'
cp: cannot stat './modules.order': No such file or directory
make: *** [Makefile:1112: _modinst_] Error 1
make: Leaving directory '/usr/src/linux-headers-3.16.85+'

I can see there is a modules.order in the /lib/modules/3.16.85+ folder but not in the /usr/src/linux-headers-3.16.85+. The sym link still stands and works properly but I get redirected to this other headers folder.

geoffreyblake commented 2 years ago

@stoinov , you have the built module, since it has no dependencies, you can do: sudo insmod print-cntkctl_el1.ko to load it, no need for modules_install.

stoinov commented 2 years ago

Thanks @geoffreyblake, here's the resulting output from dmesg:

[1031565.459946] 0: cntkclt_el1=0x0
[1031565.459952] 3: cntkclt_el1=0x0
[1031565.459957] 1: cntkclt_el1=0x0
[1031565.460030] 2: cntkclt_el1=0x0
geoffreyblake commented 2 years ago

CNTKCTL_EL1 is 0, that will explain the unhandled trap @stoinov . You can try to modify your kernel or just modify this driver code to execute when its loaded:

cntkctl_el1 = 0x2;
asm volatile("msr s3_0_c14_c1_0, %0" : : "r" (cntkctl_el1));

At that point, I would assume things will work.

grooverdan commented 2 years ago

Thank you very much @geoffreyblake for assisting @stoinov.

peiandsky commented 1 year ago

Set the maximum memory allocated to the database to be less than 8GB

grooverdan commented 1 year ago

Set the maximum memory allocated to the database to be less than 8GB

What is this? A request for help? Instructions related to arm64?

See https://github.com/MariaDB/mariadb-docker#getting-help if this is a request for help.