Clozure / ccl

Clozure Common Lisp
http://ccl.clozure.com
Apache License 2.0
857 stars 103 forks source link

sigreturn on macOS 10.14 Beta on functions including (quit) #146

Closed currymj closed 1 month ago

currymj commented 6 years ago

On a beta machine, if I try:

Michaels-MacBook-Pro-6% ccl64
Clozure Common Lisp Version 1.11.5  (DarwinX8664)

For more information about CCL, please see http://ccl.clozure.com.

CCL is free software.  It is distributed under the terms of the Apache
Licence, Version 2.0.
? (quit)
sigreturn returned
? for help
[5280] Clozure CL kernel debugger: b
current thread: tcr = 0x103010, native thread ID = 0x307, interrupts enabled

(#x0000000000647EE0) #x00003000006326A4 : #<Function %NANOSLEEP #x00003000006323AF> + 757
(#x0000000000647F68) #x000030000064BBBC : #<Function HOUSEKEEPING-LOOP #x000030000064B9DF> + 477
(#x0000000000647FB8) #x000030000064C274 : #<Function (:INTERNAL (TOPLEVEL-FUNCTION (LISP-DEVELOPMENT-SYSTEM T))) #x000030000064C16F> + 261
[5280] Clozure CL kernel debugger: 

I get a similar result trying to load quicklisp, although loading my own test file that just defines a simple function works fine.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/62866882-sigreturn-on-macos-10-14-beta-on-functions-including-quit?utm_campaign=plugin&utm_content=tracker%2F27935804&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F27935804&utm_medium=issues&utm_source=github).
xrme commented 6 years ago

I think I have a simple fix for the 1.12 development branch. If it seems stable there, I'll back-port it to 1.11.5 shortly.

xrme commented 6 years ago

212c2544 seems to fix 1.12-dev.

I did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do (rebuild-ccl :clean t)), when I evaluate (quit), I see errors like this:

Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664
? (quit)
> Error: Fault during read of memory address #x0
> While executing: 0, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
$ 

And for 32-bit:

Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664
? (quit)
> Error: Fault during read of memory address #x0
> While executing: 0, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.

Frankly, I am not much inclined to worry about this, even if I port 212c2544 to 1.11.5.

edoneel commented 6 years ago

Hi,

Thanks!  It works perfectly.

The comment in the change says 10.4 not 10.14.  Right now we all know what it means, but, it might be confusing at some point :-)

cheers
bruce

212c254 seems to fix 1.12-dev.
I did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do (rebuild-ccl :clean t)), when I evaluate (quit), I see errors like this:
Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664 ? (quit) > Error: Fault during read of memory address #x0 > While executing: 0, in process listener(1). > Type :POP to abort, :R for a list of available restarts. > Type :? for other options. $
And for 32-bit:
Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664 ? (quit) > Error: Fault during read of memory address #x0 > While executing: 0, in process listener(1). > Type :POP to abort, :R for a list of available restarts. > Type :? for other options.
Frankly, I am not much inclined to worry about this, even if I port 212c254 to 1.11.5.

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/Clozure/ccl","title":"Clozure/ccl","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/Clozure/ccl"}},"updates":{"snippets":[{"icon":"PERSON","message":"@xrme in #146: 212c2544 seems to fix 1.12-dev.\r\n\r\nI did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do (rebuild-ccl :clean t)), when I evaluate (quit), I see errors like this:\r\n\r\nClozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664\r\n? (quit)\r\n\u003e Error: Fault during read of memory address #x0\r\n\u003e While executing: 0, in process listener(1).\r\n\u003e Type :POP to abort, :R for a list of available restarts.\r\n\u003e Type :? for other options.\r\n$ \r\n\r\nAnd for 32-bit:\r\n\r\nClozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664\r\n? (quit)\r\n\u003e Error: Fault during read of memory address #x0\r\n\u003e While executing: 0, in process listener(1).\r\n\u003e Type :POP to abort, :R for a list of available restarts.\r\n\u003e Type :? for other options.\r\n\r\n\r\nFrankly, I am not much inclined to worry about this, even if I port 212c2544 to 1.11.5.\r\n "}],"action":{"name":"View Issue","url":"https://github.com/Clozure/ccl/issues/146#issuecomment-417813263"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/Clozure/ccl/issues/146#issuecomment-417813263", "url": "https://github.com/Clozure/ccl/issues/146#issuecomment-417813263", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [Clozure/ccl] sigreturn on macOS 10.14 Beta on functions including (quit) (#146)", "sections": [ { "text": "", "activityTitle": "R. Matthew Emerson", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@xrme", "facts": [ ] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"Clozure/ccl\",\n\"issueId\": 146,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"Clozure/ccl\",\n\"issueId\": 146\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/Clozure/ccl/issues/146#issuecomment-417813263" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 374814768\n}" } ], "themeColor": "26292E" } ]

xrme commented 6 years ago

It is beginning to look like it isn't safe to get rid of DarwinSigReturn on pre-Mojave systems.

Things seem to work most of the time, but there are definitely issues.

I'm going to quote some mail from openmcl-devel that shows some (fairly heavyweight) steps to reproduce:

Clone the following into quicklisp/local-projects:

https://gitlab.common-lisp.net/dcooper/zacl.git
https://github.com/gendl/aserve.git
https://gitlab.common-lisp.net/gendl/gendl.git

Then:

(ql:quickload :gendl) (gendl:start-gendl!)

That should print a banner and let you know which port the webserver is running on.

Now go to the following URL in your browser:

http://localhost:9000/tasty (or whatever the port is)

Accept the default robot:assembly.

Hover over the root node in the tree at upper-left and see the "Pencil" icon show up. Click the Pencil icon.

This should result in the reported crash.

The crash is happening some time during the call to the gdlAjax function, which is invoked through an Ajax call when clicking that "pencil" hover-over icon. The gdlAjax function is defined in the file gendl/gwl/ajax/source/ajax.lisp.

My reply:

I think https://github.com/Clozure/ccl/commit/212c25448fb1743c3c51707e69a8b7a604e714a6 is not as problem-free as I originally thought it might be.

If I run a CCL IDE that includes that change on a High Sierra system (like the test ccl.pkg), then I see a crash. But when I used a command-line lisp, I didn't see the crash, oddly enough.

If I take that change out (i.e., revert 212c2544), it works fine.

On the other hand, if I run the Lisp installed from the test ccl.pkg on a macOS Mojave system, your test case appears to work fine.

I wish I remembered why we needed the DarwinSigReturn workaround in the first place. It looks like it is going to be necessary to detect at runtime whether we are on a pre-Mojave macOS, and leave the DarwinSigReturn thing in place if so.

xrme commented 6 years ago

I'm also seeing crashes into the lisp kernel debugger when doing make certify-books-short from acl2-8.1 sources when running on Mojave.

Example:

   | Unhandled exception 4 at 0x7fff77485b53, context->regs at #x7000110c5590
   | ? for help
   | [21182] Clozure CL kernel debugger: Exit code from ACL2 is 137
   | -rw-r--r--  1 rme  staff  1844 Nov 13 19:28 world-theorems.cert

So I'm pretty sure that just getting rid of DarwinSigReturn is not the complete solution for Mojave, which is unfortunate.

xrme commented 6 years ago

It seems that there's a third argument to the sigreturn system call on Mojave. From the _sigtramp disassembly, we see:

    0x7fff77485b43 <+35>: movq   %rbx, %rdi
    0x7fff77485b46 <+38>: movl   $0x1e, %esi
    0x7fff77485b4b <+43>: movq   %r12, %rdx
    0x7fff77485b4e <+46>: callq  0x7fff77488594            ; symbol stub for: __sigreturn

The High Sierra sigtramp doesn't put anything in %rdx.

I have no idea what this extra argument is (and I don't see the Mojave sources on opensource.apple.com yet).

https://trac.clozure.com/ccl/changeset/11565 is a breadcrumb. Other archaeology leads me to beleive that we're in this situation because (at one point at least) sigaltstack isn't (or wasn't) thread-local on Darwin.

xrme commented 6 years ago

Another test case:

On macOS Mojave (earlier macOS versions work as expected):

  1. build acl2-8.1
  2. start acl2 and then evaluate
    (thm (equal (append (append x y) x y x y x y x y)
           (append x y x y x y x y x y)))
  3. hit C-c and observe that CCL enters the lisp kernel debugger
    Unhandled exception 4 at 0x7fff77485b53, context->regs at #x70000b261590
    ? for help
    [31147] Clozure CL kernel debugger:
xrme commented 6 years ago

This exception (4 is SIGILL) is because the call to sigreturn in the sigtramp routine returned unexpectedly, and there's helpfully an illegal instruction there to catch that unexpected case.

xrme commented 5 years ago

https://opensource.apple.com/release/macos-1014.html is now available (but they say "coming soon" for the sources for xnu-4903.201.2, which is probably what I really need to figure out what the third arg to sigreturn is.)

rprimus commented 5 years ago

Tue Dec 11 10:11:28 GMT 2018

@xrme

bsd/dev/i386/unix_signal.c:

688:sigreturn(struct proc *p, struct sigreturn_args *uap, __unused int *retval)

for cross references: http://newosxbook.com/xxr/index.jl?q=sigreturn&ver=xnu-4903.221.2&case=false&def=false

tarball: https://opensource.apple.com/tarballs/xnu/xnu-4903.221.2.tar.gz

almsanac commented 5 years ago

Given the resources available and the complexity of the issue, I'd prioritize getting it to work on Mojave, and deprecate earlier versions of MacOS. If there's time to get it running in earlier versions, that's great. But the most important thing is getting it working on Mojave.

xrme commented 5 years ago

I ran a 1.11.5 binary under a Mojave debug kernel. I got, as I feared I would, the following message:

process dx86cl64[405] sigreturn token mismatch: received 0x7ffeefbff120 expected 0xb993f3f80d520774

After this debug message is printed, the sigreturn system call returns with an error code.

The Mojave sources contain code to mitigate a class of attacks ("sigreturn oriented programming") described in, for example, https://dl.acm.org/citation.cfm?id=2650802. It seems that this mitigation breaks a technique that CCL has been using.

update: link to PDF of paper in question: https://www.cs.vu.nl/~herbertb/papers/srop_sp14.pdf

xrme commented 5 years ago

Thanks to some help from Apple DTS, I committed dd5622e9, and this really does seem to make CCL compatible with macOS Mojave.

svspire commented 5 years ago

dd5622e9da69edc48dbf97c6caa6f3a7e16f932b Seems to still work fine on OSX 10.9.5 in terminal mode and as a GUI app.

stoney commented 5 years ago

Are we going to see the Mac App Store version of Clozure CL updated soon? I'm waiting for that since my attempts to build it from this repos + comments haven't worked.

xrme commented 5 years ago

I just submitted an updated Mac App Store version of Clozure CL. It now has to get through app review. This seems to take about a week. It might take a little longer if the app review team identifies any issues that need to be corrected.

I'll post a note when it is approved (as I hope it will be).