Clozure / ccl

Clozure Common Lisp
http://ccl.clozure.com
Apache License 2.0
850 stars 103 forks source link

Can't build a working 1.12.1 (or latest master) on Windows 10 #425

Closed kyanha closed 1 year ago

kyanha commented 1 year ago

Toolchain: msys2 via its installer, with the mingw-w64-x86_64-toolchain group installed.

I get a bunch of warnings about getpagesize, wopen, gettimeofday, _open_osfhandle, and debug_show_registers not being defined, a warning of a read of 4096 bytes from a region of size 1, and a note of source object spjump_start of size 1.

But that's not what concerns me the most.

The output executable can't be executed. I presume it's from the trailing output of the following command (executed by make):

/mingw64/bin/x86_64-w64-mingw32-gcc -Wl,--image-base=0x10000 -Wl,-script=pei-x86-64.x -m64 -g    -o ../../wx86cl64.exe  pad.o x86-spjump64.o x86-spentry64.o x86-subprims64.o pmcl-kernel.o gc-common.o x86-gc.o bits.o  x86-exceptions.o x86-utils.o image.o thread_manager.o lisp-debug.o memory.o windows-calls.o x86-asmutils64.o  imports.o lispdcmd.o plprint.o plsym.o xlbt.o x86_print.o -lpsapi -lws2_32 -static -lpthread
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../../wx86cl64.exe:/4: section below image base
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../../wx86cl64.exe:/20: section below image base
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ../../wx86cl64.exe:/36: section below image base

When I attempt to run the output wx86cl64.exe, under msys2 bash I get bash: ./wx86cl64.exe: cannot execute binary file: Exec format error; when I try to run it from cmd, I just get a blue system modal dialog that says "This app cannot run on your PC".

This happens with the distributed 1.12.1 .zip file, as well as the 'v1.12.1' tag and 'master' branch.

Is there anything I might do differently to perhaps get it working? I would like some of the fixes committed to master in my lisp kernel.

Thanks for your help!

(note: this is not the same error or output as #408, so I'm filing it as a separate issue.)

xrme commented 1 year ago

I built the release binaries with cygwin. I don't think I've used msys2, at least not in a long time.

$ cc -v
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.9.3/lto-wrapper.exe
Target: x86_64-pc-cygwin
Configured with: /cygdrive/i/szsz/tmpp/gcc/gcc-4.9.3-1.x86_64/src/gcc-4.9.3/configure --srcdir=/cygdrive/i/szsz/tmpp/gcc/gcc-4.9.3-1.x86_64/src/gcc-4.9.3 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin --target=x86_64-pc-cygwin --without-libiconv-prefix --without-libintl-prefix --libexecdir=/usr/lib --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --enable-bootstrap --enable-__cxa_atexit --with-dwarf2 --with-tune=generic --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite --enable-threads=posix --enable-libatomic --enable-libgomp --disable-libitm --enable-libquadmath --enable-libquadmath-support --enable-libssp --enable-libada --enable-libgcj-sublibs --disable-java-awt --disable-symvers --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix --without-libintl-prefix --with-system-zlib --enable-linker-build-id
Thread model: posix
gcc version 4.9.3 (GCC)
kyanha commented 1 year ago

Okay. I don't know anything at all about GNU ld scripting, unfortunately. Can you perhaps help me understand what's going on that can cause those error messages?

bshetty commented 1 year ago

If you look at Mathews output he is using gcc 4.9.3 which is pretty old. Whereas you must be using latest versions. I had the same issue with gcc11.2(latest on cygwin). Adding directives to the debug sections solved this issue. However the executable still crashes on startup. This happens in remap_spjump function before the call to memmove. The VirtualProtect function reports invalid access for SPJUMP_TARGET_ADDRESS + 0x3000. I dont know enough yet about this to fix it.

There are also issues about multiple definitions - stuff that is defined in x86-spjump64.s and not declared as extern in gc.h(mainly). All of these can be fixed for good with minimum changes. Another function that can be improved is reserve_tls_slots() in windows-calls.c. The comments implies we block Tls index from #30 to #63. This must be based on old OS (comment says Tls index on fresh lisp returns 11 in XP and 23 in Windows 7). In Windows 10 it returns 27. Don't know what the limit of Tls index was in XP/Win7. As per windows Thread Local Storage documentation _The constant TLS_MINIMUMAVAILABLE defines the minimum number of TLS indexes available in each process. This minimum is guaranteed to be at least 64 for all systems. The maximum number of indexes per process is 1,088. which means we should change this code.

Another change in ld is that PEI code is generated by default. Adding -no-pei to $(CC) fixed the issue in linux x86_64, but in windows the wx86cl64.exe generated thus crashes on startup.

bshetty commented 1 year ago

I was using gdb and had not realised this actually starts lisp !

with -no-pei flag the wx86cl64.exe enters start_lisp func and then displays the following. Can someone tell what this means and what should be done to fix this ?

$ ./wx86cl64.exe %rax = 0x0000000000000000 %r8 = 0x0000000000000000 %rcx = 0x0000000000000000 %r9 = 0xffffffdf00000018 %rdx = 0x0000000000000000 %r10 = 0x000000001c000000 %rbx = 0x000000000478eb60 %r11 = 0x0000007f3f800000 %rsp = 0x0000000024c3ed30 %r12 = 0x000000000478eb60 %rbp = 0x0000000002fbbfd0 %r13 = 0x0000000000000000 %rsi = 0x0000000000000000 %r14 = 0x0000000002fbbfd0 %rdi = 0x0000000002fb23a0 %r15 = 0x000000000001302e %rip = 0x000000000002e563 %rflags = 0x00010247 Exception on foreign stack

Exception occurred while executing foreign code ? for help [15344] Clozure CL kernel debugger:

bshetty commented 1 year ago

when i disassemble the address from %rip in gdb, this is x86-gc.c:calculate_relocation() disass /mr 0x2e563 Dump of assembler code for function calculate_relocation: 1537 { 0x000000000002e490 <+0>: 53 push %rbx

1538 LispObj *relocptr = GCrelocptr; 0x000000000002e498 <+8>: 48 8b 0d 59 1f 03 00 mov 0x31f59(%rip),%rcx # 0x603f8 <managed_static_refbits+336>

1539 LispObj current = GCareadynamiclow; 0x000000000002e49f <+15>: 4c 8b 05 1a 1f 03 00 mov 0x31f1a(%rip),%r8 # 0x603c0 <managed_static_refbits+280>

1540 bitvector

1541 markbits = GCdynamic_markbits; 0x000000000002e4a6 <+22>: 48 8b 15 23 1f 03 00 mov 0x31f23(%rip),%rdx # 0x603d0 <managed_static_refbits+296>

1542 qnode q = (qnode ) markbits;

1543 natural npagelets = ((GCndynamic_dnodes_in_area+(nbits_in_word-1))>>bitmap_shift); 0x000000000002e491 <+1>: 48 8b 05 18 1f 03 00 mov 0x31f18(%rip),%rax # 0x603b0 <managed_static_refbits+264> 0x000000000002e4ad <+29>: 48 83 c0 3f add $0x3f,%rax

1544 natural thesebits;

1545 LispObj first = 0; 0x000000000002e4c6 <+54>: 45 31 c9 xor %r9d,%r9d 0x000000000002e4c9 <+57>: eb 41 jmp 0x2e50c <calculate_relocation+124> 0x000000000002e4cb <+59>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

1546 1547 if (npagelets) { 0x000000000002e4b1 <+33>: 48 c1 e8 06 shr $0x6,%rax 0x000000000002e4b5 <+37>: 0f 84 a5 00 00 00 je 0x2e560 <calculate_relocation+208>

1548 do {

1549 *relocptr++ = current; 0x000000000002e510 <+128>: 4c 89 01 mov %r8,(%rcx) 0x000000000002e513 <+131>: 48 83 c1 08 add $0x8,%rcx

1550 thesebits = *markbits++; 0x000000000002e50c <+124>: 48 83 c2 08 add $0x8,%rdx 0x000000000002e517 <+135>: 48 8b 42 f8 mov -0x8(%rdx),%rax

1551 if (thesebits == ALL_ONES) { 0x000000000002e51b <+139>: 48 83 f8 ff cmp $0xffffffffffffffff,%rax 0x000000000002e51f <+143>: 75 af jne 0x2e4d0 <calculate_relocation+64>

1552 current += nbits_in_word*dnode_size; 0x000000000002e521 <+145>: 49 81 c0 00 04 00 00 add $0x400,%r8

1553 q += 4; / sic /

1554 } else { 1555 if (!first) { 0x000000000002e4d0 <+64>: 4d 85 c9 test %r9,%r9 0x000000000002e4d3 <+67>: 74 6b je 0x2e540 <calculate_relocation+176>

1556 first = current; 0x000000000002e540 <+176>: 4d 89 c1 mov %r8,%r9

1557 while (thesebits & BIT0_MASK) { 0x000000000002e543 <+179>: 48 85 c0 test %rax,%rax 0x000000000002e546 <+182>: 79 8d jns 0x2e4d5 <calculate_relocation+69> 0x000000000002e548 <+184>: 0f 1f 84 00 00 00 00 00 nopl 0x0(%rax,%rax,1) 0x000000000002e554 <+196>: 48 01 c0 add %rax,%rax 0x000000000002e557 <+199>: 78 f7 js 0x2e550 <calculate_relocation+192> 0x000000000002e559 <+201>: e9 77 ff ff ff jmp 0x2e4d5 <calculate_relocation+69> 0x000000000002e55e <+206>: 66 90 xchg %ax,%ax

1558 first += dnode_size; 0x000000000002e550 <+192>: 49 83 c1 10 add $0x10,%r9

1559 thesebits += thesebits;

1560 } 1561 } 1562 / We're counting bits in qnodes in the wrong order here, but 1563 that's OK. I think ... / 1564 current += one_bits(*q++); 0x000000000002e4bb <+43>: 4c 8b 15 be 1e 03 00 mov 0x31ebe(%rip),%r10 # 0x60380 <managed_static_refbits+216> 0x000000000002e4c2 <+50>: 4c 8d 1c c1 lea (%rcx,%rax,8),%r11 0x000000000002e4d5 <+69>: 0f b7 42 f8 movzwl -0x8(%rdx),%eax 0x000000000002e4dd <+77>: 41 0f b7 04 42 movzwl (%r10,%rax,2),%eax

1565 current += one_bits(*q++); 0x000000000002e4d9 <+73>: 0f b7 5a fa movzwl -0x6(%rdx),%ebx 0x000000000002e4e2 <+82>: 41 0f b7 1c 5a movzwl (%r10,%rbx,2),%ebx 0x000000000002e4e7 <+87>: 48 01 d8 add %rbx,%rax 0x000000000002e4ea <+90>: 4c 01 c0 add %r8,%rax

1566 current += one_bits(*q++); 0x000000000002e4ed <+93>: 44 0f b7 42 fc movzwl -0x4(%rdx),%r8d 0x000000000002e4f2 <+98>: 47 0f b7 04 42 movzwl (%r10,%r8,2),%r8d 0x000000000002e4f7 <+103>: 4c 01 c0 add %r8,%rax

1567 current += one_bits(*q++); 0x000000000002e4fa <+106>: 44 0f b7 42 fe movzwl -0x2(%rdx),%r8d 0x000000000002e4ff <+111>: 47 0f b7 04 42 movzwl (%r10,%r8,2),%r8d 0x000000000002e504 <+116>: 49 01 c0 add %rax,%r8

1568 } 1569 } while(--npagelets); 0x000000000002e507 <+119>: 4c 39 d9 cmp %r11,%rcx 0x000000000002e50a <+122>: 74 21 je 0x2e52d <calculate_relocation+157> 0x000000000002e528 <+152>: 4c 39 d9 cmp %r11,%rcx 0x000000000002e52b <+155>: 75 df jne 0x2e50c <calculate_relocation+124>

1570 } 1571 *relocptr++ = current; 0x000000000002e530 <+160>: 4c 89 01 mov %r8,(%rcx) 0x000000000002e563 <+211>: 4c 89 01 mov %r8,(%rcx)

1572 return first ? first : current; 0x000000000002e52d <+157>: 4d 85 c9 test %r9,%r9 0x000000000002e533 <+163>: 4d 0f 45 c1 cmovne %r9,%r8

1573 } 0x000000000002e537 <+167>: 4c 89 c0 mov %r8,%rax 0x000000000002e53a <+170>: 5b pop %rbx 0x000000000002e53b <+171>: c3 ret 0x000000000002e53c <+172>: 0f 1f 40 00 nopl 0x0(%rax) 0x000000000002e560 <+208>: 4c 89 c0 mov %r8,%rax 0x000000000002e566 <+214>: 5b pop %rbx 0x000000000002e567 <+215>: c3 ret 0x000000000002e568 <+216>: 0f 1f 84 00 00 00 00 00 nopl 0x0(%rax,%rax,1)

End of assembler dump.

xrme commented 1 year ago

408 is another issue reporting a problem building with msys on Windows.

bshetty commented 1 year ago

An easy way to get things running is use an old gcc version(v 4.9.3 like mathew). I recently used gccv2.7.1 with binutilsv2.23. The linker script had to be updated - download from pei-x86-64_x.txt. To use it change the name from pei-x86-64_x.txt to pei-x86-64.x and move it to ../lisp-kernel/win64 directory before building). Also the older AS cannot process x86-spjump64 - it errors as below:

m4 -DWIN_64 -DWINDOWS -DX86 -DX8664 -DHAVE_TLS -DEMUTLS -DTCR_IN_GPR -I../ ../x86-spjump64.s | /cygdrive/c/_<pathto>_/MinGW64/bin/as  -g --64 -o x86-spjump64.o

C:\otools\_<pathto>_\MinGW64\bin\as.exe: out of memory allocating 4294967280 bytes
make: *** [Makefile:102: x86-spjump64.o] Error 1

So had to make the following changes to the makefile: comment out the following

#ifeq ($(MSYSTEM),)
#CC = x86_64-w64-mingw32-gcc
#AS = x86_64-w64-mingw32-as
#LD = x86_64-w64-mingw32-ld
#else
#CC = /mingw64/bin/x86_64-w64-mingw32-gcc
#AS = /mingw64/bin/as
#LD = /mingw64/bin/ld
#endif

replace it with

CC=/cygdrive/c/<pathto>/MinGW64/bin/gcc
AS=x86_64-w64-mingw32-as
LD=/cygdrive/c/<pathto>/MinGW64/bin/ld

AS is set to the cygwin version shipped with binutilsv2.39(gets installed with gcc 11.3)

run make and you have a lisp built.

bshetty commented 1 year ago

About building from latest gcc we have a few issues.

Around 2016/18 windows 10 (version i use) started supporting ASLR. gcc also changed somewhere in between and generates pie(position independent executable) by default.

The linker script has to be updated (can use the version in the previous post). and these options need to be added to linker (via the compiler driver) -no-pie -Wl,--disable-reloc-section -Wl,--allow-multiple-definition

This will build wx86cl64.exe and you should able to start it. However this still results in wx86cl64 crashing in function calculate_reolcation(..) at line 1571 of x86-gc.c. This happens because between the time pmcl-kernel calls start_lisp and control reaches back to c code( windows_exception_handler() at ../x86-exceptions.c:2150), I have seen atleast two global variables - global_reloctab and GCndynamic_dnodes_in_area are reset to 0. I do not think anything in lisp-kernel code (assembly or c) changes these.

My guess is windows does this. But this does not happen when we compile using the old compiler(actually the linker/bfd should be blamed). These tools do something that trigger the resets. I have been trying to figure this out with no success.

Windows does many things behind our back like it added extra heap space (something about handling FTH) when it noticed wx86cl64.exe crashing a lot of times.

bshetty commented 1 year ago

@xrme I guess you are following my posts on the developer mailing list as well. We should get ccl to build as position independent executable. Can you give suggestions as to how to proceed with this?

kyanha commented 1 year ago

About building from latest gcc we have a few issues.

Around 2016/18 windows 10 (version i use) started supporting ASLR. gcc also changed somewhere in between and generates pie(position independent executable) by default.

The linker script has to be updated (can use the version in the previous post). and these options need to be added to linker (via the compiler driver) -no-pie -Wl,--disable-reloc-section -Wl,--allow-multiple-definition

With these changes (and the extraction of win64-headers/ and win32-headers/ from the Windows distribution zip file), msys64's UCRT64 environment can rebuild a working lisp, that can successfully execute (ccl:rebuild-ccl :full t).

gcc -v output:

Using built-in specs.
COLLECT_GCC=C:\msys64\ucrt64\bin\gcc.exe
COLLECT_LTO_WRAPPER=C:/msys64/ucrt64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-12.2.0/configure --prefix=/ucrt64 --with-local-prefix=/ucrt64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/ucrt64/include --libexecdir=/ucrt64/lib --enable-bootstrap --enable-checking=release --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++,jit --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts --enable-libstdcxx-time --disable-libstdcxx-pch --enable-lto --enable-libgomp --disable-multilib --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/ucrt64 --with-mpfr=/ucrt64 --with-mpc=/ucrt64 --with-isl=/ucrt64 --with-pkgversion='Rev6, Built by MSYS2 project' --with-bugurl=https://github.com/msys2/MINGW-packages/issues --with-gnu-as --with-gnu-ld --disable-libstdcxx-debug --with-boot-ldflags=-static-libstdc++ --with-stage1-ldflags=-static-libstdc++
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Rev6, Built by MSYS2 project)

I'm going to try running the Common Lisp conformance test suite (from GNU Lisp's repo) against it to see how many things break.

(Personally, I try to avoid Cygwin, because it breaks binary compatibility with other binaries on Windows. Yes, for a long time Cygwin was the only way to get a functioning POSIX layer on Windows, but that doesn't seem to be the case anymore. I intend to compile code for the foreign function interface, and Cygwin-compiled code cannot be mixed with non-Cygwin-compiled code. That said, it's not my place to fight for or against Cygwin as the official distribution platform. It's easy enough to compile a new version with a new toolchain, once you get past past the linker script and position independent executable issues.)

xrme commented 1 year ago

Try using https://github.com/Clozure/ccl-tests

kyanha commented 1 year ago

Here's the output from the official Cygwin binary:

Doing 21914 pending tests of 21914 tests total.
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>

Test CL-TEST::CCL.BUG\#1068.A failed
Form: (LET* ((CL-TEST::NAME "a\\*x") (CL-TEST::NAME/ "a\\*x/") (CL-TEST::NAME/* "a\\*x/*.*") (CL-TEST::FILE "a\\*x/temp.dat")) (WHEN (PROBE-FILE CL-TEST::NAME) (IF (DIRECTORYP CL-TEST::NAME) (DELETE-DIRECTORY CL-TEST::NAME/) (DELETE-FILE CL-TEST::NAME))) (ENSURE-DIRECTORIES-EXIST CL-TEST::NAME/) (CLOSE (OPEN CL-TEST::FILE :DIRECTION :OUTPUT :IF-EXISTS :ERROR)) (CLOSE (OPEN CL-TEST::FILE :DIRECTION :OUTPUT :IF-EXISTS :SUPERSEDE)) (LENGTH (DIRECTORY "a\\*x/*.*")))
Expected value: 1
Actual value: #<FILE-ERROR #x2104519DBD>.

================ Test suite failed ================

1 out of 21914 total tests failed:
   CL-TEST::CCL.BUG\#1068.A.
(FUNCALL DO-TESTS :COMPILE COMPILE :VERBOSE VERBOSE :CATCH-ERRORS T)
took 205,115,000 microseconds (205.115000 seconds) to run.
       4,250,006 microseconds (  4.250006 seconds, 2.07%) of which was spent in GC.
During that period, and with 8 available CPU cores,
     197,000,000 microseconds (197.000000 seconds) were spent in user mode
       7,156,250 microseconds (  7.156250 seconds) were spent in system mode
 7,718,825,984 bytes of memory allocated.
(CL-TEST::CCL.BUG\#1068.A)

Here's the output from the MSYS2 UCRT64 build I made:

Doing 21914 pending tests of 21914 tests total.
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>
Invoking restart: #<RESTART CL-TEST::FOO #x22DC112D>

Test CL-TEST::CCL.BUG\#1068.A failed
Form: (LET* ((CL-TEST::NAME "a\\*x") (CL-TEST::NAME/ "a\\*x/") (CL-TEST::NAME/* "a\\*x/*.*") (CL-TEST::FILE "a\\*x/temp.dat")) (WHEN (PROBE-FILE CL-TEST::NAME) (IF (DIRECTORYP CL-TEST::NAME) (DELETE-DIRECTORY CL-TEST::NAME/) (DELETE-FILE CL-TEST::NAME))) (ENSURE-DIRECTORIES-EXIST CL-TEST::NAME/) (CLOSE (OPEN CL-TEST::FILE :DIRECTION :OUTPUT :IF-EXISTS :ERROR)) (CLOSE (OPEN CL-TEST::FILE :DIRECTION :OUTPUT :IF-EXISTS :SUPERSEDE)) (LENGTH (DIRECTORY "a\\*x/*.*")))
Expected value: 1
Actual value: #<FILE-ERROR #x21044E0B8D>.

================ Test suite failed ================

1 out of 21914 total tests failed:
   CL-TEST::CCL.BUG\#1068.A.
(FUNCALL DO-TESTS :COMPILE COMPILE :VERBOSE VERBOSE :CATCH-ERRORS T)
took 215,898,000 microseconds (215.898000 seconds) to run.
       4,524,733 microseconds (  4.524733 seconds, 2.10%) of which was spent in GC.
During that period, and with 8 available CPU cores,
     206,437,500 microseconds (206.437500 seconds) were spent in user mode
       7,750,000 microseconds (  7.750000 seconds) were spent in system mode
 7,718,402,016 bytes of memory allocated.
(CL-TEST::CCL.BUG\#1068.A)

These were both run from the gcl-tests directory with different paths to wx86cl64.exe -n -l load.lisp, then executing the (run-tests) form.

I am currently running Windows 10 22H2, 19045.2365.

(edit to copy the same line length of output, rather than just the output from the bad test.)

kyanha commented 1 year ago

Following on my prior comment:

It looks as though the MSYS2 UCRT64 version that I compiled is at the very least no worse than the officially-compiled Cygwin version.

I think the "Bug 1068" issue should be tracked as a separate issue, since it isn't related to compilation on Windows.

(It would help, I think, if the internal Clozure Inc issue #1068 could be copied into that new issue, so the parameters of what's being tested can be understood. From a naive reading, it looks like there's at least one platform where the creation of files and/or directories got hung up in the OS's cache, and either didn't actually happen before a file was created in a newly-created directory, or the fresh read for DIRECTORY didn't actually include the newly-created file?)

In any case, I'm leaving this issue #425 open, because while I managed to get a working lisp using msys2's UCRT64 environment it doesn't follow that a working lisp using Cygwin is happening, and I am not going to ask that the project change its distribution ABI.

xrme commented 1 year ago

I thought that Cygwin produced vanilla Windows binaries.

It certainly was once the case that you don't need Cygwin to run a compiled CCL.

kyanha commented 1 year ago

Cygwin puts a dependency on cygwin1.dll into every Windows binary it compiles and links. You don't necessarily need the full cygwin installation, but you do need the DLL to perform the POSIX translations. (And you need the full installation to do any changes of the mounts, such as to turn /cygdrive/c into /c and such, unless you know the registry parameters to change.)

xrme commented 1 year ago

So you're saying the release at https://github.com/Clozure/ccl/releases/download/v1.12.1/ccl-1.12.1-windowsx86.zip won't run on a system without Cygwin installed? This is (unpleasant) news to me.

kyanha commented 1 year ago

So you're saying the release at https://github.com/Clozure/ccl/releases/download/v1.12.1/ccl-1.12.1-windowsx86.zip won't run on a system without Cygwin installed? This is (unpleasant) news to me.

https://cygwin.com/faq.html#faq.programming.static-linking (FAQ 6.14).

xrme commented 1 year ago

We use the Cygwin environment to host the mingw toolchain, but we don't use the Cygwin API, AFAIK.

kyanha commented 1 year ago

MinGW has its own tools for hosting its own toolchain, and to my knowledge has never needed Cygwin as its host environment. Please forgive me for saying this, but whatever Frankensteined mess you've got going on uses a 15+ year outdated gcc, has led to confusion about which provider (and thus library linkage) is used to create your binary, and is incapable of being replicated by other people and installations, so is simply all-around unhelpful for ongoing development. (Particularly, it does not lend itself to being used to create reproducible builds -- which may or may not be a consideration in this project, but is currently considered an industry best-practice for open-source privacy, security, and system-level projects. You can read more at https://reproducible-builds.org/, if you have the time and inclination.)

Please consider upgrading your gcc and ld to something that supports position-independent executable code under Windows, so ccl can use modern OS innovations like ASLR to prevent malware which exploits FFI-loaded modules from being able to exploit the fixed addresses where it loads it libraries to spread.

I created pull request #432 to make the output of modern GNU C and LD versions work when compiled with the UCRT64-using toolchain from msys2. (That's the Windows Universal C Runtime, as opposed to the MSVC versioned runtimes; the UCRT didn't exist for at least 6 years after the version of the mingw toolchain you use was released.) You may find it helpful as a base for building with a more recent mingw. (Again, my goal isn't to change your ABI.)

Thank you for for considering this.

On Mon, Jan 9, 2023, 23:31 R. Matthew Emerson @.***> wrote:

We use the Cygwin environment to host the mingw toolchain, but we don't use the Cygwin API, AFAIK.

— Reply to this email directly, view it on GitHub https://github.com/Clozure/ccl/issues/425#issuecomment-1376721078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABMMFPNAU2M27NGMXLYKULWRTQ3PANCNFSM6AAAAAAR7ON4TA . You are receiving this because you authored the thread.Message ID: @.***>

xrme commented 1 year ago

In a6aeda7d7 I made changes so that the Windows lisp kernels, both 32-bit and 64-bit, will use the MSYS2 platform to be built.

I need to mention that I used the MINGW32 and MINGW64 environments, which use the MSVCRT C library. This is because the interface databases in the current CCL distribution were build from MSVCRT header files.

The newer compiler is producing several new warnings, but I'll address those in a separate change.

Regarding ASLR, CCL as currently implemented expects to have control of its address space. In particular, it expects to put some stuff in fixed locations in low-ish memory. Thus, CCL can't be loaded at some random location.

Anyway, these changes should enable anyone to build the Windows lisp kernels with up-to-date MSYS2-hosted toolchains. I left comments in win32/Makefile and win64/Makefile that will, I hope, be helpful.

I don't know if you're still interested in CCL after all this time, but still, I thank you both for your investigative work that helped me understand what to do to use an up-to-date Windows development toolchain.