estraier / tkrzw

a set of implementations of DBM
Apache License 2.0
164 stars 20 forks source link

ppc64le check error #24

Closed tieugene closed 2 years ago

tieugene commented 2 years ago

Up to 1.0.10 everything was ok, but starting from 1.0.11 build ok for all architectures excepting ppc64le (PowerPC). I proposed that this is because of 073bdc9d1a745766874d0516e9017dde46eaf37d that fixed i686 bug. But reverting this commit's changes back has no effect - ppc64le builds (more precisely make check) still fails: https://koji.fedoraproject.org/koji/taskinfo?taskID=76258864 (By the way i686 build ok without 073bdc9d1a745766874d0516e9017dde46eaf37d). Tail of build (make check-light) log (https://kojipkgs.fedoraproject.org//work/tasks/8959/76258959/build.log):

LD_LIBRARY_PATH=.:/usr/local/lib::. ./tkrzw_dbm_perf sequence --dbm poly --path casket.tkt \
  --iter 1000 --threads 2 --params "dbm=tree,num_buckets=500"
Setting: impl=poly num_iterations=1000 value_size=8 num_threads=2
.................................................. (00000050)
.................................................. (00000100)
.................................................. (00000150)
.................................................. (00000200)
.................................................. (00000250)
.................................................. (00000300)
.................................................. (00000350)
.................................................. (00000400)
.................................................. (00000450)
.................................................. (00000500)
.................................................. (00000550)
.................................................. (00000600)
.................................................. (00000650)
.................................................. (00000700)
.................................................. (00000750)
.................................................. (00000800)
.................................................. (00000850)
.................................................. (00000900)
.................................................. (00000950)
.................................................. (00001000)
Synchronizing: ... done (elapsed=0.000922)
Setting done: elapsed_time=0.006026 num_records=2000 qps=331893 mem=4288000
  file_size=43008
Getting: impl=poly num_iterations=1000 value_size=8 num_threads=2
.................................................. (00000050)
.................................................. (00000100)
.................................................. (00000150)
............................................. (00001000)
Getting done: elapsed_time=0.001653 num_records=2000 qps=1209954 mem=4480000
  file_size=43008
Iterating: impl=poly num_iterations=1000 value_size=8 num_threads=2
Iterating done: elapsed_time=0.000172 num_records=2000 qps=11618571 mem=4480000
  file_size=43008
Removing: impl=poly num_iterations=1000 value_size=8 num_threads=2
Synchronizing: ... done (elapsed=0.000399)
Removing done: elapsed_time=0.000607 num_records=2000 qps=3294819 mem=4544000
  file_size=43008
make[1]: Leaving directory '/builddir/build/BUILD/tkrzw-1.0.13'
Get failed: BROKEN_DATA_ERROR: invalid magic checksum
make[1]: *** [Makefile:758: check-perf-sampler] Error 1

Something wrong starting from 1.0.11

estraier commented 2 years ago

Thanks for reporting this. It seems that the checksum algorithm has overflow or endian issue, though I'm not sure. Does "ppc64le" represents Power PC, 64-bit, little endian? Can I see the result of "tkrzw_build_util config" on the system?

tieugene commented 2 years ago

Does "ppc64le" represents Power PC, 64-bit, little endian?

Exactly.

Can I see the result of "tkrzw_build_util config" on the system?

I have no this system in my hands. Packages are building at remote virtual hosts. So I just can insert some command in build sequence and see log. Now part of .spec file is:

%prep
%autosetup

%build
autoreconf -vif
%configure
%make_build apidoc all
for bin in \
  tkrzw_build_util tkrzw_str_perf tkrzw_file_perf tkrzw_dbm_perf tkrzw_dbm_util
do
  LD_LIBRARY_PATH=$PWD help2man --no-info --no-discard-stderr \
    --version-string='%{version}' --output="${bin}.1" \
    "./${bin}"
done

%install
%make_install
# Remove static .a file
rm -f %{buildroot}%{_libdir}/lib%{name}.a
# mans
install -d %{buildroot}%{_mandir}/man1
install -t %{buildroot}%{_mandir}/man1 -m 0644 -p tkrzw_*.1

%check
%make_build check-light

Where I have to insert tkrzw_build_util config?

estraier commented 2 years ago

I released 1.0.14. Now, "tkrzw_build_util config" is run at the beginning of "make check" and "make check-light" so we can confirm the result in the log.

tieugene commented 2 years ago

https://kojipkgs.fedoraproject.org//work/tasks/5013/76265013/build.log PS. you can notify me if/when bug will be fixed and before flushing new tag - I will test it on Fedora build farm.

estraier commented 2 years ago

Thanks. Could you check the following version, where the checksum returns 0 always. https://dbmx.net/tkrzw/pkg/tkrzw-1.0.14-nochecksum.tar.gz

If it works, the checksum is the culprit. If not, I have to investigate other parts.

On Sat, Sep 25, 2021 at 8:02 PM Eugene @.***> wrote:

https://kojipkgs.fedoraproject.org//work/tasks/5013/76265013/build.log

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/estraier/tkrzw/issues/24#issuecomment-927104713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQGJVRA4KU7G6MW6L6XDU4TUDWT4DANCNFSM5EXKICIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

tieugene commented 2 years ago

Not: https://kojipkgs.fedoraproject.org//work/tasks/9498/76269498/build.log Other archs ok: https://koji.fedoraproject.org/koji/taskinfo?taskID=76269402

estraier commented 2 years ago

Thanks. The problem doesn't seem so simple. I have to obtain a PowerPC environment to investigate it more.

estraier commented 2 years ago

Another notable difference between 1.0.10 and 1.0.11 is this change. https://github.com/estraier/tkrzw/commit/073bdc9d1a745766874d0516e9017dde46eaf37d

So, could you check this package where the change is reverted? https://dbmx.net/tkrzw/pkg/tkrzw-1.0.14-nommapfix.tar.gz

tieugene commented 2 years ago

As I wrote "I proposed that this is because of @073bdc9 that fixed i686 bug. But reverting this commit's changes back has no effect". But I'll try

estraier commented 2 years ago

Oops, I was confused. Perhaps, the bug had already been in 1.0.10 but not found because check-light was not there.

tieugene commented 2 years ago

I tried to rebuild original 1.0.13 with the patch:

diff -dur tkrzw-1.0.13.orig/tkrzw_dbm_hash.cc tkrzw-1.0.13/tkrzw_dbm_hash.cc
--- tkrzw-1.0.13.orig/tkrzw_dbm_hash.cc 2021-09-22 21:08:26.000000000 +0300
+++ tkrzw-1.0.13/tkrzw_dbm_hash.cc  2021-09-25 10:14:30.466455004 +0300
@@ -1508,8 +1508,7 @@
           }
         } else {
           if (old_is_set) {
-            eff_data_size_.fetch_add(
-                static_cast<int32_t>(new_value.size()) - static_cast<int32_t>(old_value.size()));
+            eff_data_size_.fetch_add(new_value.size() - old_value.size());
           } else {
             num_records_.fetch_add(1);
             eff_data_size_.fetch_add(key.size() + new_value.size());
diff -dur tkrzw-1.0.13.orig/tkrzw_dbm_tree.cc tkrzw-1.0.13/tkrzw_dbm_tree.cc
--- tkrzw-1.0.13.orig/tkrzw_dbm_tree.cc 2021-09-22 21:08:26.000000000 +0300
+++ tkrzw-1.0.13/tkrzw_dbm_tree.cc  2021-09-25 10:17:19.167476222 +0300
@@ -1904,8 +1904,7 @@
         node->page_size +=
             static_cast<int32_t>(new_rec_size) - static_cast<int32_t>(old_rec_size);
         node->dirty = true;
-        eff_data_size_.fetch_add(
-            static_cast<int32_t>(new_value.size()) - static_cast<int32_t>(old_value_size));
+        eff_data_size_.fetch_add(new_value.size() - old_value_size);
         if (static_cast<int32_t>(new_value.size()) > old_value_size) {
           if (CheckLeafNodeToDivide(node)) {
             reorg_ids_.Insert(std::make_pair(node->id, std::string(records.front()->GetKey())));
tieugene commented 2 years ago

Oops... Your last source ('nommapfix') works: https://koji.fedoraproject.org/koji/taskinfo?taskID=76273881 If this is the solution it is not required to make new tag. Now Fedora has 0.9.16 in stable repo and 1.0.13 in testing. I can just add required patch for 1.0.14 on next build.

sharkcz commented 2 years ago

@estraier , if you need access to a ppc64le hw, please send me an email to sharkcz at fedoraproject dot org, I can arrange you an account on our public machine.

estraier commented 2 years ago

I'm glad if I can do some experiments on the ppc64le environment. We shouldn't be complacent without knowing the cause of the failure.

On Sun, Sep 26, 2021 at 2:45 AM Dan Horák @.***> wrote:

@estraier https://github.com/estraier , if you need access to a ppc64le hw, please send me an email to sharkcz at fedoraproject dot org, I can arrange you an account on our public machine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/estraier/tkrzw/issues/24#issuecomment-927157931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQGJVRHC3JBCTIRNOBZS3PDUDYDFJANCNFSM5EXKICIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

tieugene commented 2 years ago

I'm glad if I can do some experiments on the ppc64le environment.

Seems sharkcz propose you direct ssh access to virtual ppc64le host. But this requires some moves from your side (e.g. you have to get Fedoraproject's signed certificate)

sharkcz commented 2 years ago

Right, I am offering a remote (ssh) access to a ppc64le virtual machine. Fedora contributors can access the Fedora public machines (https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainers), but we have also a solution for upstream projects/authors, who are not Fedora contributors. All I need is an email with your public ssh key.

tieugene commented 2 years ago

1.0.15: bug still exists

estraier commented 2 years ago

Thanks Dan and Eugene. My public key is the following.

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAtWY5XD4t3uYIOTZ0WUH2BGnFAdIdkVUZplUQB34CW/YOpWoeOJy/CKfXibZ+xubH7cSPnwLgg26ZMJWaxxzF1R3jXzJCTKe/LoqdNkqjSCqSiqyXLFOf2K4PdQ5MKA80VIMO+PDmtd3noOeEN/Mu667S2IYMYu5BD9XW+haiOwvwiQd8+9y4FKKBnPjZle9u5XjgKs3QVG1bDa0phWQu95q7/7QrVQp4AFGsBNDNfOltPp/zjG+tU9MSPa5jscX0YcWo4QD+TnQaJNv9wCXdZAJvy/6Bs+k/g9CpAPbmNYuBDXwrVUXNGQaPYEEg4isMCgposDHP91oo7ZkXnU8dkQ== mikio@kyoto

sharkcz commented 2 years ago

Please send me an email (sharkcz at fedoraproject.org), I will need to send back you the login details :-) Thanks.

tieugene commented 2 years ago

Please send me an email (sharkcz at fedoraproject.org), I will need to send back you the login details :-) Thanks.

As the author published his pubkey - you can publish login detail here. Why not? PS. Another problem is in private key, but let's wait.

sharkcz commented 2 years ago

It's a private system, thus I won't share details here. It's outside the official Fedora infra.

sharkcz commented 2 years ago

Login info sent, please check your spam box too. Gmail doesn't like me sometimes :-)

estraier commented 2 years ago

I can login on it. Thanks!

estraier commented 2 years ago

"make check" and "make check-light" of 1.0.15 work well on the ppc64le system. So, the platform doesn't seem the cause of the error. Now I'm suspecting that some resource limitation on the build system might be the cause.

@tieugene could you run "ulimit -a" on the build system? I'd like to simulate it. And, is the GCC version same as "gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC)"

tieugene commented 2 years ago

@tieugene could you run "ulimit -a" on the build system? I'd like to simulate it. And, is the GCC version same as "gcc version 11.2.1 20210728 (Red Hat 11.2.1-1) (GCC)"

According to log (search ==== ULIMIT/ ====):

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 97324
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 97324
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

PS. You can compare with other archs, where builds ok. Click required build and search build.log link.

tieugene commented 2 years ago

1.0.16: still failed

estraier commented 2 years ago

I'm still struggling to reproduce the bug.

@tieugene Could you check the build of this package on the ppc64le system? It adds "madvice" system call after "mremap" system call though it is not necessary on the POSIX specification. https://dbmx.net/tkrzw/pkg/tkrzw-1.0.16-madvise.tar.gz

tieugene commented 2 years ago

@tieugene Could you check the build of this package on the ppc64le system? It adds "madvice" system call after "mremap" system call though it is not necessary on the POSIX specification. https://dbmx.net/tkrzw/pkg/tkrzw-1.0.16-madvise.tar.gz

Failed If you have an access to vhost with PPC64le (as @sharkcz promised) you can try to build the package in-place. Just:

  1. Download srpm
  2. sudo dnf install rpm-build (if not installed)
  3. rpmbuild --rebuild tkrzw-...src.rpm

PS. I asked vhost chief to give me same access but no response yet

estraier commented 2 years ago

Thanks. The problem seems related to the behavior of mmap so I'll investigate more. Reproduction is the key for debugging so I appreciate that I can use the build system myself.

estraier commented 2 years ago

rpmbuild doesn't fail on my account on the ppc64le. So, maybe I should consider another way to reproduce it.

sharkcz commented 2 years ago

I did rebuilds using mock on my local machine and on the public machine. It fails locally, but builds OK on the public one. Both used F-34 buildroot, so the software build environment was (exactly) the same. Isn't there multi-threading active in the software, for example? The public machine has only 2 vCPUs.

estraier commented 2 years ago

The command which dies is the following. It uses two active threads for test tasks and one therad waiting for them. tkrzw_dbm_perf sequence --dbm poly --path casket.tkt --iter 1000 --threads 2 --params "dbm=tree,num_buckets=500"

What happens with the following command which uses 1 active thread? tkrzw_dbm_perf sequence --dbm poly --path casket.tkt --iter 2000 --threads 1 --params "dbm=tree,num_buckets=500"

sharkcz commented 2 years ago

still fails

...
LD_LIBRARY_PATH=.:/usr/local/lib::. ./tkrzw_dbm_perf sequence --dbm poly --path casket.tkt \
  --iter 2000 --threads 1 --params "dbm=tree,num_buckets=500"
Setting: impl=poly num_iterations=2000 value_size=8 num_threads=1
.................................................. (00000100)
.................................................. (00000200)
.................................................. (00000300)
.................................................. (00000400)
.................................................. (00000500)
.................................................. (00000600)
.................................................. (00000700)
.................................................. (00000800)
.................................................. (00000900)
.................................................. (00001000)
.................................................. (00001100)
.................................................. (00001200)
.................................................. (00001300)
.................................................. (00001400)
.................................................. (00001500)
.................................................. (00001600)
.................................................. (00001700)
.................................................. (00001800)
.................................................. (00001900)
.................................................. (00002000)
Synchronizing: ... done (elapsed=0.000696)
Setting done: elapsed_time=0.010143 num_records=2000 qps=197180 mem=4160000
  file_size=41984

Getting: impl=poly num_iterations=2000 value_size=8 num_threads=1
.................................................. (00000100)
.................................................. (00000200)
.................................................. (00000300)
.................................................. (00000400)
.................................................. (00000500)
.................................................. (00000600)
.................................................. (00000700)
.................................................. (00000800)
.................................................. (00000900)
.................................................. (00001000)
.................................................. (00001100)
.................................................. (00001200)
.................................................. (00001300)
.................................................. (00001400)
.................................................. (00001500)
.........................................Get failed: BROKEN_DATA_ERROR: invalid magic checksum
 (00002000)
Getting done: elapsed_time=0.007697 num_records=2000 qps=259846 mem=4224000
  file_size=41984

Iterating: impl=poly num_iterations=2000 value_size=8 num_threads=1
Iterating done: elapsed_time=0.000082 num_records=2000 qps=24385488 mem=4224000
  file_size=41984

Removing: impl=poly num_iterations=2000 value_size=8 num_threads=1
Synchronizing: ... done (elapsed=0.000263)
Removing done: elapsed_time=0.000350 num_records=2000 qps=5714311 mem=4288000
  file_size=41984

make[1]: *** [Makefile:774: check-perf-sampler] Error 1

I will look closer what the difference between the machines could be. It's weird ...

estraier commented 2 years ago

Thanks. Reproducing the problem with one thread is a good news because the deterministic behavior is easy to analyze.

tieugene commented 2 years ago

Thanks. Reproducing the problem with one thread is a good news because the deterministic behavior is easy to analyze.

By the way - one from my home host has 1 CPU with 1 (one) core (P4 3.0GHz).

sharkcz commented 2 years ago

@estraier, I think I have a public environment where I can reproduce the failure using @tieugene 's rpm, expect a new email :-)

estraier commented 2 years ago

Thanks Dan, Now I can reproduce it. I think that the bug is related to the mmap or mremap system call. When the error happens, the region after the offset where mremap was called for the first time has been filled with zero code. I'll investigate further to know why this happens.

On Fri, Oct 8, 2021 at 6:31 PM Dan Horák @.***> wrote:

@estraier https://github.com/estraier, I think I have a public environment where I can reproduce the failure using @tieugene https://github.com/tieugene 's rpm, expect a new email :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/estraier/tkrzw/issues/24#issuecomment-938494763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQGJVRGDLAOUGZCNT45XYADUF23A5ANCNFSM5EXKICIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

estraier commented 2 years ago

I found the cause of the error. On the PowerPC system, memory-mapped region managed by "mmap" beyond the file size metadata managed by "ftruncate" is filled with zero. On other Linux systems, data in such regin is retained as long as the file size metadata is set to the size of mapped region before calling "msync" or "munmap". Let me illustrate this by pseudo-code.

// #1: Initialize mmap(size=10) ftruncate(size=10)

// #2: write something write(offset=0, size=5, data="01234")

// #3 truncate mmap(size=10) ftruncate(size=5) // This is smaller than the map size

// #4 write something // This is beyond the file size metadata set by fturncate write(offset=5, data="56789")

// #5 synchronize ftruncate(size=10) munmap(size=10)

The problem is, at the #4 step, the data is written beyond the size 5 set by ftruncte. On many system, the data writtin in the map is retained and finally synchronized to the file at #5 step because the metadata is set 10 before calling munmap.

Though I'm not sure, the difference of the behavior seems related to Linux version. Anyway, now that I know the cause, I can fix it.

tieugene commented 2 years ago

// #3 truncate mmap(size=10) ftruncate(size=5) // This is smaller than the map size // #4 write something write(offset=5, data="56789") // #5 synchronize ftruncate(size=10) munmap(size=10) The problem is, at the #4 step, the data is written beyond the size 5 set by ftruncte. On many system, the data writtin in the map is retained and finally synchronized to the file at #5 step because the metadata is set 10 before calling munmap.

I'm not familiar in system programming at all but as for me if you try to write data beyond reserved array - nobody can warranty nothing. Is not it?

estraier commented 2 years ago

Basically, yes. The region is actually allocaed by the mmap call. However, some data is beyond the file size metadata. What happens in that case is not specified by the specification AFAIK. I've been assuming that data beyond the metadata is retained and written back to the file as long as the metadata is set before calling "munmap". It is the case in all systems where I've tested my code. But, it's not the case on the PPC system at least.

Anyway, the practical solution is to set the file size metadata aligned to the map size. I'll modify the code so.

estraier commented 2 years ago

I released 1.0.17. make check and make check-light passes on the both PPC systems. I also tested it on Ubuntu 20.10 (x64), MacOS 11.3 (M1) and Windows 11. Thanks Eugene and Dan. This fix couldn't be done without your help.

tieugene commented 2 years ago

Nice: ✓F34…✓F36. If python-0.1.23 wrapper is compatible with it I will push new version into Fedora repos. Closed in f2a08e6d038fef998a2230ad215ef07c2223f942

tieugene commented 2 years ago

oops... Issue autoclosing not works. PR only.

estraier commented 2 years ago

As the latest version of Tkrzw (C++) abolishes the memory locking feature, test cases of olf packages of language bindings fail. Thus, I released new versions for them. As for Python, 0.1.24 is the new version.