Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
369 stars 19 forks source link

Running under cygwin #103

Closed masinter closed 1 year ago

masinter commented 3 years ago

Currently in maiko/cygwin2 branch:

now it compiles, but dies when run:

$ ./run-medley -full
sysout is /cygdrive/c/Users/Larry/home/ilisp/medley/loadups/xfull35.sysout
running /cygdrive/c/Users/Larry/home/ilisp/maiko/cygwin.x86_64/lde -g 1440x900 -sc 1440x900
start /cygdrive/c/Users/Larry/home/ilisp/medley/greetfiles/SIMPLE-INIT
Failed to find UNIXCOMM file handles; no processes
./run-medley: line 159:  2091 Segmentation fault      (core dumped) $prog $geometry $screensize -t "Medley Interlisp" $passthrough_args "$LDESRCESYSOUT"
waywardmonkeys commented 3 years ago

Submitted PR interlisp/maiko#128 to address the d_reclen issue.

masinter commented 3 years ago

i spent some time with gdb documentation and stepping through main in gdb but couldn't get by the execvp Changes to compile in cygwin3 branch

nbriggs commented 3 years ago

You appear to have merged your cygwin3 branch into master without telling anyone?

nbriggs commented 3 years ago

See https://jeffkreeftmeijer.com/git-rebase/ for an explanation of what you probably wanted to do (git rebase, to re-apply your local branch changes over a copy of the updated master branch, without merging your experiment back into master)

masinter commented 3 years ago

time for me to go back through the git tutorials

waywardmonkeys commented 3 years ago

The current state of this is that it crashes on master, right?

Do you have a stack trace?

masinter commented 3 years ago

cygwin-gdb.txt

waywardmonkeys commented 3 years ago

So, that's stack corruption it seems ...

Looking at the code, we see something that seems dangerous:

* sys_size is sysout size in megabytes */
int sysout_loader(char * sysout_file_name, int sys_size)
{
  int sysout; /* SysoutFile descriptor */

  IFPAGE ifpage; /* IFPAGE */

  char *fptovp_scratch; /* scratch area for FPTOVP */
#ifdef BIGVM
  /* 1 "cell" per page for 256MB in 512 byte pages */
  unsigned int fptovp[0x80000]; /* FPTOVP */
#else
  DLword fptovp[0x10000]; /* FPTOVP */
#endif                /* BIGVM */
  long fptovp_offset; /* FPTOVP start offset */

That's quite a decently sized stack allocation there for FPTOVP.

You could try telling Cygwin to give it a bigger stack size via adding this to the gcc command line:

-Wl,--stack,33554432

Alternatively, you could malloc that and make sure it gets freed in the exit path (or just turn it into a malloc and leak it to see if that makes the crash go away...).

masinter commented 3 years ago

Now it compiles ok and doesn't crash and runs the opcode tests in test.vm, but doesn't recognize keypress or mouse clicks (same problem as with running under WSL1)

waywardmonkeys commented 3 years ago

Which fix did you apply to prevent the crash?

masinter commented 3 years ago

I take back what I said:

I think I just compiled it with gcc instead of clang. I wasn't sure now to apply either.

I also added -Wl,--stack,33554432 to the gcc line

masinter commented 3 years ago

yes, fixes segv on startup. Now it's just hanging because it doesn't get mouse clicks or keystrokes

nbriggs commented 3 years ago

Good news. Presumably that's after recompiling without the 32MB stack option. I'm trying to come up with a principled way of getting the X events processed.

masinter commented 3 years ago

Steve reported having problems with running on WSL2 in his configuration -- screen comes up but no characters recognized. Is it possible to fork a little process to act as a keystroke forwarder that waits for the keyboard and sends events whenever it gets a keystroke or mouse move?

nbriggs commented 3 years ago

We could use some documentation on what WSL2 (or cygwin) supports for polled or interrupt driven I/O, and then we can try to fix it. I don't have access to a WSL2 system (or one with cygwin), so there's not much I can do, or suggestions I can make to those who do, without some input...

devhawala commented 1 year ago

Hello,

cygwin apparently does not support asynchronous i/o or a virtual timer, as the callback routines int_io_service resp. int_timer_service in timer.c are never called, so for example when pressing the right mouse button, the menu appears but then the system is stuck. (networking does not work either, as incoming packets are not noticed)

As work-around in my dodo-nethub-support branch of Maiko, i added simple count-down mechanism to the instruction dispatcher (dispatch in xc.c): the compile variable CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN defines the interval as number of instructions for simulating an external i/o interrupt, so the interrupt checking machinery is fired in more or less regular intervals.

This variable is set in makefile-cygwin.x86_64-x (and only there) to 20000. To prevent compile errors, enabling async i/o in ether.c for the Nethub socket is made dependent of the presence of O_ASYNC (which is missing in Cygwin)

So i have now a working version of Maiko running natively on Windows10 as Cygwin executable which can do networking with Dodo XNS services and uses a local X11 server (VcXsrv 1.20.14.0). The Maiko process behaves normally in my understanding, the overall CPU load of the system is around 1-3% when the Medley environment is idle and goes up to ~15% (100% on about 1 out of 8 logical CPUs) when the right mouse button is pressed in Medley. So even if i/o is checked all 20.000 instructions, only minimal CPU resources are consumed when the Medley system is idle and only one CPU goes to 100% while a context menu is shown.

However networking is a bit "lethargic" compared to async i/o on Linux, but it is usable. Maybe using a lower value for CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN could make it more snappy, but this value must be adjusted to the local hardware anyway...

Greetings, Hans

nbriggs commented 1 year ago

The configuration information that is specific to all instances of an OS like cygwin should be incorporated in the inc/maiko/platform.h -- we have carefully removed most of the system specific defines from the makefile fragments and I'd rather not move backwards there, if you don't mind.

Since there are likely to be other systems where simulating the async timer would be useful, I would suggest that the #define not be CYGWIN_... but rather MAIKO_EMULATE_TIMER_INTERRUPTS (or similar) and look at how to integrate it should other systems need it.

Oddly, cygwin defines FASYNC rather than O_ASYNC -- not sure what's going on there.

nbriggs commented 1 year ago

Can you check if current cygwin supports ITIMER_REAL rather than ITIMER_VIRTUAL — I don’t have any Windows boxes to check this on, but chatter from about 17 years ago suggests that starting with cygwin version 1.5.16 it supported the realtime timer.

On Oct 1, 2022, at 2:13 PM, devhawala @.***> wrote:

Hello,

cygwin apparently does not support asynchronous i/o or a virtual timer, as the callback routines int_io_service resp. int_timer_service in timer.c are never called, so for example when pressing the right mouse button, the menu appears but then the system is stuck. (networking does not work either, as incoming packets are not noticed)

As work-around in my dodo-nethub-support https://github.com/devhawala/maiko/tree/dodo-nethub-support branch of Maiko, i added simple count-down mechanism to the instruction dispatcher (dispatch in xc.c): the compile variable CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN defines the interval as number of instructions for simulating an external i/o interrupt, so the interrupt checking machinery is fired in more or less regular intervals.

This variable is set in makefile-cygwin.x86_64-x (and only there) to 20000. To prevent compile errors, enabling async i/o in ether.c for the Nethub socket is made dependent of the presence of O_ASYNC (which is missing in Cygwin)

So i have now a working version of Maiko running natively on Windows10 as Cygwin executable which can do networking with Dodo XNS services and uses a local X11 server (VcXsrv 1.20.14.0). The Maiko process behaves normally in my understanding, the overall CPU load of the system is around 1-3% when the Medley environment is idle and goes up to ~15% (100% on about 1 out of 8 logical CPUs) when the right mouse button is pressed in Medley. So even if i/o is checked all 20.000 instructions, only minimal CPU resources are consumed when the Medley system is idle and only one CPU goes to 100% while a context menu is shown.

However networking is a bit "lethargic" compared to async i/o on Linux, but it is usable. Maybe using a lower value for CYGWIN_TIMER_ASYNC_EMULATION_INSNS_COUNTDOWN could make it more snappy, but this value must be adjusted to the local hardware anyway...

Greetings, Hans

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/103#issuecomment-1264482821, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6DAWMY5GHPKW3ARV4FFZLWBCSPDANCNFSM4VF54H7A. You are receiving this because you commented.

devhawala commented 1 year ago

Here some intermediate results:

Meaning the work-around simulating timer and i/o interrupts will be needed for the cygwin version.

So next things will be incorporating the Addr68k/NativeAlignment{24} changes and moving cygwin-specific dependencies to platform.h

nbriggs commented 1 year ago

I suspect that running with ITIMER_REAL may be causing an interrupt before the rest of the system is ready for it. I'll check out the situation with ITIMER_REAL on other systems and see if it's debuggable there.

I also notice that looks as though you're making the ether fd async before the async signal handling is set up -- it can't be made to generate interrupts before int_io_init() is called -- which in the USE_DLPI case of raw ethernet is where the I_SETSIG is applied to the ether fd.

nbriggs commented 1 year ago

OK -- the issue is probably that if you change to ITIMER_REAL you have to have a signal handler for SIGALRM instead of a handler for SIGVTALRM... did you change that too?

nbriggs commented 1 year ago

Yes... with ITIMER_REAL and SIGALRM it works the same on macOS as it does with ITIMER_VIRTUAL/SIGVTALRM.

devhawala commented 1 year ago

well ... not on cygwin: the active section of int_timer_init() is as follows in my local test version, but defining USE_ITIMER_REAL in platform.h for cygwin lets ldex abort the startup (not defining it works ok):

  struct itimerval timert;
  struct sigaction timer_action;

  timer_action.sa_handler = int_timer_service;
  sigemptyset(&timer_action.sa_mask);
  timer_action.sa_flags = SA_RESTART;

#ifdef USE_ITIMER_REAL
  int intAlarm = SIGALRM;
  int intTimer = ITIMER_REAL;
  printf("** using ITIMER_REAL for timer\n");
#else
  int intAlarm = SIGVTALRM;
  int intTimer = ITIMER_VIRTUAL;
#endif

  if (sigaction(intAlarm, &timer_action, NULL) == -1) {
    perror("sigaction: SIGVTALRM");
  }

  /* then attach a timer to it and turn it loose */
  timert.it_interval.tv_sec = timert.it_value.tv_sec = 0;
  timert.it_interval.tv_usec = timert.it_value.tv_usec = TIMER_INTERVAL;
  setitimer(intTimer, &timert, NULL);
devhawala commented 1 year ago

for "making the ether fd async": i removed that from connectToHub() as it also happens in the SUBRs ether_resume() and ether_ctrlr(), which both should be called after the Lisp system is up (and it still works of course)

nbriggs commented 1 year ago

You changed all the instances?

timer.c:  if (sigaction(SIGALRM, &timer_action, NULL) == -1) {
timer.c:    perror("sigaction: SIGALRM");
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:  sigaddset(&signals, SIGALRM);
timer.c:/*  Error handling routine for SIGALRM.  Called when any        */
timer.c:/*  Set up the signal handler for SIGALRM, to catch TIMEOUTs:   */
timer.c:  if (sigaction(SIGALRM, &timer_action, NULL) == -1) {
timer.c:    perror("sigaction: SIGALRM");
unixfork.c:  sigaddset(&signals, SIGALRM);
unixfork.c:  sigaddset(&signals, SIGALRM);
nbriggs commented 1 year ago

Inspecting the code, I believe the timer emulation code is going to fail if you compile with OPDISP.

devhawala commented 1 year ago

no, not all instances, but i think there was no need for:

These were the only instances of the string "ALRM" that i found (except for comments or strings) in those files.

for the OPDISP: is this still used?
Scanning with grep, it seems that is was used for the sunos-variants (Makefiles in bin/legacy) and possibly with DOS, but it is not defined in platform.h or in any Makefiles in bin.

nbriggs commented 1 year ago

OPDISP with the threaded dispatch is new. It improves the performance.

nbriggs commented 1 year ago

OPDISP is not turned on by default yet, when we do turn it on, that could be done for systems other than Cygwin.

BTW -- where do you want bug reports for Dodo services submitted? The ECHO XIP packets that are returned by, for example, the file server implementation, don't match the echo packets it receives (missing the data, 2 bytes shorter than the one original sent)

nbriggs commented 1 year ago

See the commit log for

commit b7e9529322269553d6c7bdd4741677270959cf09
Author: Bruce Mitchener <bruce.mitchener@gmail.com>
Date:   Sat Jan 23 03:28:16 2021 +0700

We replaced the assembler fast dispatch with gcc's (and clang's) computed goto extension.

devhawala commented 1 year ago

bug reports please directly to issues in the Dodo project.

for OPDISP dispatch: if there is no common code segment where all instructions come through (i get lost each time i try to follow the #define and goto maze in xc.c), then timer emulation may be excluded from fast dispathing. No problem, Maiko seems fast enough for me as it is now...

nbriggs commented 1 year ago

Missing include for <netinet/in.h> prevents it compiling on FreeBSD.

clang -m32 -std=gnu99 -fno-strict-aliasing -c -O2 -g -DMAIKO_ENABLE_NETHUB -I/usr/local/include -DXWINDOW -DRELEASE=351 -I../inc/ -I../include/ ../src/ether.c -o ../freebsd.386-x/ether.o
../src/ether.c:260:22: error: variable has incomplete type 'struct sockaddr_in'
  struct sockaddr_in hubaddr;
                     ^
../src/ether.c:260:10: note: forward declaration of 'struct sockaddr_in'
  struct sockaddr_in hubaddr;
         ^
1 error generated.
devhawala commented 1 year ago

added the include and just created the merge request Dodo Nethub support #445

masinter commented 1 year ago

Running with cygwin works fine now. It would probably be useful for running on older Windows systems.

I am closing this issue -- is it worth a mention in documentation or a special build ?