bitwiseworks / libc

LIBC Next (kLIBC fork)
9 stars 4 forks source link

git clone issue #34

Closed SilvanScherrer closed 5 years ago

SilvanScherrer commented 5 years ago

while cloning the tcl-os2 git repo with: git clone https://github.com/bitwiseworks/tcl-os2 . we get the following error: Klone nach '.\tcl-os2' ... remote: Enumerating objects: 1115, done. remote: Counting objects: 100% (1115/1115), done. remote: Compressing objects: 100% (536/536), done. remote: Total 175344 (delta 810), reused 774 (delta 579), pack-reused 174229 Empfange Objekte: 100% (175344/175344), 148.29 MiB | 789.00 KiB/s, Fertig. fatal: Kann Paketdatei nicht lesen: Bad file number fatal: index-pack fehlgeschlagen

while when we use libc it looks like the following: Klone nach '.' ... remote: Enumerating objects: 1115, done. remote: Counting objects: 100% (1115/1115), done. remote: Compressing objects: 100% (536/536), done. remote: Total 175344 (delta 810), reused 774 (delta 579), pack-reused 174229 Empfange Objekte: 100% (175344/175344), 148.29 MiB | 799.00 KiB/s, Fertig. Löse Unterschiede auf: 100% (145845/145845), Fertig.Fri 15.03.2019 17.36.18

SilvanScherrer commented 5 years ago

the errors might varry. We also saw Klone nach '.\tcl-os2' ... remote: Enumerating objects: 1115, done. remote: Counting objects: 100% (1115/1115), done. remote: Compressing objects: 100% (536/536), done. remote: Total 175344 (delta 810), reused 774 (delta 579), pack-reused 174229 Empfange Objekte: 100% (175344/175344), 148.29 MiB | 789.00 KiB/s, Fertig. Löse Unterschiede auf: 7% (11135/145845), abgeschlossen mit 0 lokalen Objekten . error: inflate: data stream error (incorrect header check) error: inflate: data stream error (incorrect header check) fatal: recursion detected in die handler fatal: index-pack fehlgeschlagen

or Cloning into '.'... remote: Enumerating objects: 1115, done. remote: Counting objects: 100% (1115/1115), done. remote: Compressing objects: 100% (536/536), done. remote: Total 175344 (delta 810), reused 774 (delta 579), pack-reused 174229 Receiving objects: 100% (175344/175344), 148.29 MiB | 543.00 KiB/s, done. error: object file D:/Coding/tcl/master/.git/objects/a7/6d6719a2a30f3edcc44c7fe15069e5f03f855d is empty Resolving deltas: 5% (7812/145845), completed with 0 local objects. fatal: recursion detected in die handler fatal: SHA1 COLLISION FOUND WITH 72c1718ff73bbb68e19f7dd400bec88dd54fc6cf ! LIBC fatal error - streams: fmutex_request failed

Killed by SIGABRT pid=0x66e5 ppid=0x66e4 tid=0x0003 slot=0x0078 pri=0x0200 mc=0x0001 ps=0x0016 C:\USR\LIBEXEC\GIT-CORE\GIT.EXE Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it. fatal: index-pack failed

SilvanScherrer commented 5 years ago

it might be an issue of the reposize, as smaller repos can be cloned w/o problems.

dmik commented 5 years ago

I downgraded LIBC to 0.6.6-40 (i.e. the last pre-LIBCn build) and I still get the crash:

Cloning into '.'...
remote: Enumerating objects: 1115, done.
remote: Counting objects: 100% (1115/1115), done.
remote: Compressing objects: 100% (536/536), done.
remote: Total 175344 (delta 810), reused 774 (delta 579), pack-reused 174229
Receiving objects: 100% (175344/175344), 148.29 MiB | 506.00 KiB/s, done.
Resolving deltas: 100% (145845/145845), done.
Assertion info: 6
Assertion failed: arc == NO_ERROR, file D:/Users/dmik/rpmbuild/BUILD/libcx-0.6.4/src/shared.c, line 564

Killed by SIGABRT
pid=0x0869 ppid=0x080b tid=0x0001 slot=0x008f pri=0x0200 mc=0x0001 ps=0x0010
C:\USR\BIN\GIT.EXE
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
error: rev-list died of signal 6
fatal: remote did not send all necessary objects

So I wonder how it could work for you with older LIBC. I will also try to downgrade LIBCx.

dmik commented 5 years ago

Discard the previouls comment, with 0.6.6-40 (and the latest LIBCx 0.6.4-1) it works. Updating LIBC to LIBCn 0.1.0-1 leads to:

Cloning into '.'...
remote: Enumerating objects: 1115, done.
remote: Counting objects: 100% (1115/1115), done.
remote: Compressing objects: 100% (536/536), done.
remote: Total 175344 (delta 810), reused 774 (delta 579), pack-reused 174229
Receiving objects: 100% (175344/175344), 148.29 MiB | 547.00 KiB/s, done.
fatal: cannot pread pack file: Bad file number
fatal: index-pack failed
dmik commented 5 years ago

So it seems like a GCC4 regression — as this is the most significant difference in builds.

dmik commented 5 years ago

BTW, cloning Mozilla fails too (expectedly):

D:>git clone https://github.com/bitwiseworks/mozilla-os2.git .
Cloning into '.'...
remote: Enumerating objects: 334200, done.
remote: Total 334200 (delta 0), reused 0 (delta 0), pack-reused 334200
Receiving objects: 100% (334200/334200), 463.84 MiB | 479.00 KiB/s, done.
Resolving deltas:   4% (8627/174306), completed with 0 local objects.
error: inflate: data stream error (incorrect header check)
fatal: serious inflate inconsistency
fatal: index-pack failed

This will require some hardcore debugging.

dmik commented 5 years ago

Further debugging shows that the problem disappears if stderr of git is redirected to a file, e.g. like this:

git clone https://github.com/bitwiseworks/tcl-os2 master2 2>stderr

My guess is that some error from the wrong place pops up and confuses git when stderr is not redirected.

dmik commented 5 years ago

Reading LIBC logs indicates that under libc066 git clone works in single-threaded mode while under LIBCn it creates as many threads as there are cores. This indicates that it's a regresssion of cf7a5dac76b0bf96474bcf6625340faae712dbef which was never part of any libc066 but is in the filrst LIBCn release.

Also, I found that giving --slient instead of 2>stderr — i.e. disabling any console output instead of redirecting it — also fixes the problem. This suggests that the failure happens only when several git threads try to write to the console (stderr in this case). Which, in turn, suggests that true console output (as opposed to file output, in case of console redirection) is not thread-safe somewhere in LIBC or maybe even in OS/2.

dmik commented 5 years ago

No, it's not stderr output per se (it's fine). The problem is somewhat deeper. Git uses setitimer to implement periodic progress updates which causes SIGALRM signals on thread 1. Git also uses pthread_create on thread 1 to create worer threads (a result of implementing _SC_NPROCESSORS_ONLN in LIBC in the commit above) and then waits on them with pthread_join. However, delivering SIGALRM signals to thread 1 somehow causes pthread_join to return early while worker threads are still running. Since thread 1 thinks they are finished, it closes file descriptors of pack files it supplied to those threads and pread (and other I/O calls) on these threads expectedly fails with Bad file number.

Now I need to debug pthread_join to see why it returns prematurely.

dmik commented 5 years ago

This turns out to be a pthread problem indeed. pthread_join returned early because of kLIBC calling DosKillThread to deliver SIGALRM to thread 1 which resulted in ERROR_INTERRUPT from DosWaitThread which was not properly handled (DosWaitThread was not retried). Fixed in http://trac.netlabs.org/ports/changeset/2345.

I've just cloned both tcl and mozilla repositories — works like a charm now. Closing this. A new pthread RPM will appear shortly.