Closed currymj closed 1 month ago
I think I have a simple fix for the 1.12 development branch. If it seems stable there, I'll back-port it to 1.11.5 shortly.
212c2544 seems to fix 1.12-dev.
I did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do (rebuild-ccl :clean t)
), when I evaluate (quit)
, I see errors like this:
Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664
? (quit)
> Error: Fault during read of memory address #x0
> While executing: 0, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
$
And for 32-bit:
Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664
? (quit)
> Error: Fault during read of memory address #x0
> While executing: 0, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
Frankly, I am not much inclined to worry about this, even if I port 212c2544 to 1.11.5.
Hi,
Thanks! It works perfectly.
The comment in the change says 10.4 not 10.14. Right now we all know what it means, but, it might be confusing at some point :-)
cheers
bruce
212c254 seems to fix 1.12-dev.
I did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do (rebuild-ccl :clean t)), when I evaluate (quit), I see errors like this:
Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664 ? (quit) > Error: Fault during read of memory address #x0 > While executing: 0, in process listener(1). > Type :POP to abort, :R for a list of available restarts. > Type :? for other options. $
And for 32-bit:
Clozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664 ? (quit) > Error: Fault during read of memory address #x0 > While executing: 0, in process listener(1). > Type :POP to abort, :R for a list of available restarts. > Type :? for other options.
Frankly, I am not much inclined to worry about this, even if I port 212c254 to 1.11.5.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/Clozure/ccl","title":"Clozure/ccl","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/Clozure/ccl"}},"updates":{"snippets":[{"icon":"PERSON","message":"@xrme in #146: 212c2544 seems to fix 1.12-dev.\r\n\r\nI did some light testing on an old 10.6 system, and while it seems to work for the most part (i.e, it works to do(rebuild-ccl :clean t)
), when I evaluate(quit)
, I see errors like this:\r\n\r\nClozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664\r\n? (quit)\r\n\u003e Error: Fault during read of memory address #x0\r\n\u003e While executing: 0, in process listener(1).\r\n\u003e Type :POP to abort, :R for a list of available restarts.\r\n\u003e Type :? for other options.\r\n$ \r\n
\r\nAnd for 32-bit:\r\n\r\nClozure Common Lisp Version 1.12-dev (v1.12-dev.3-18-g212c2544) DarwinX8664\r\n? (quit)\r\n\u003e Error: Fault during read of memory address #x0\r\n\u003e While executing: 0, in process listener(1).\r\n\u003e Type :POP to abort, :R for a list of available restarts.\r\n\u003e Type :? for other options.\r\n
\r\n\r\nFrankly, I am not much inclined to worry about this, even if I port 212c2544 to 1.11.5.\r\n "}],"action":{"name":"View Issue","url":"https://github.com/Clozure/ccl/issues/146#issuecomment-417813263"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/Clozure/ccl/issues/146#issuecomment-417813263", "url": "https://github.com/Clozure/ccl/issues/146#issuecomment-417813263", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [Clozure/ccl] sigreturn on macOS 10.14 Beta on functions including (quit) (#146)", "sections": [ { "text": "", "activityTitle": "R. Matthew Emerson", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@xrme", "facts": [ ] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"Clozure/ccl\",\n\"issueId\": 146,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"Clozure/ccl\",\n\"issueId\": 146\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/Clozure/ccl/issues/146#issuecomment-417813263" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 374814768\n}" } ], "themeColor": "26292E" } ]
It is beginning to look like it isn't safe to get rid of DarwinSigReturn on pre-Mojave systems.
Things seem to work most of the time, but there are definitely issues.
I'm going to quote some mail from openmcl-devel that shows some (fairly heavyweight) steps to reproduce:
Clone the following into quicklisp/local-projects:
https://gitlab.common-lisp.net/dcooper/zacl.git https://github.com/gendl/aserve.git https://gitlab.common-lisp.net/gendl/gendl.git
Then:
(ql:quickload :gendl) (gendl:start-gendl!)
That should print a banner and let you know which port the webserver is running on.
Now go to the following URL in your browser:
http://localhost:9000/tasty (or whatever the port is)
Accept the default robot:assembly.
Hover over the root node in the tree at upper-left and see the "Pencil" icon show up. Click the Pencil icon.
This should result in the reported crash.
The crash is happening some time during the call to the gdlAjax function, which is invoked through an Ajax call when clicking that "pencil" hover-over icon. The gdlAjax function is defined in the file gendl/gwl/ajax/source/ajax.lisp.
My reply:
I think https://github.com/Clozure/ccl/commit/212c25448fb1743c3c51707e69a8b7a604e714a6 is not as problem-free as I originally thought it might be.
If I run a CCL IDE that includes that change on a High Sierra system (like the test ccl.pkg), then I see a crash. But when I used a command-line lisp, I didn't see the crash, oddly enough.
If I take that change out (i.e., revert 212c2544), it works fine.
On the other hand, if I run the Lisp installed from the test ccl.pkg on a macOS Mojave system, your test case appears to work fine.
I wish I remembered why we needed the DarwinSigReturn workaround in the first place. It looks like it is going to be necessary to detect at runtime whether we are on a pre-Mojave macOS, and leave the DarwinSigReturn thing in place if so.
I'm also seeing crashes into the lisp kernel debugger when doing make certify-books-short
from acl2-8.1 sources when running on Mojave.
Example:
| Unhandled exception 4 at 0x7fff77485b53, context->regs at #x7000110c5590
| ? for help
| [21182] Clozure CL kernel debugger: Exit code from ACL2 is 137
| -rw-r--r-- 1 rme staff 1844 Nov 13 19:28 world-theorems.cert
So I'm pretty sure that just getting rid of DarwinSigReturn is not the complete solution for Mojave, which is unfortunate.
It seems that there's a third argument to the sigreturn system call on Mojave. From the _sigtramp disassembly, we see:
0x7fff77485b43 <+35>: movq %rbx, %rdi
0x7fff77485b46 <+38>: movl $0x1e, %esi
0x7fff77485b4b <+43>: movq %r12, %rdx
0x7fff77485b4e <+46>: callq 0x7fff77488594 ; symbol stub for: __sigreturn
The High Sierra sigtramp doesn't put anything in %rdx.
I have no idea what this extra argument is (and I don't see the Mojave sources on opensource.apple.com yet).
https://trac.clozure.com/ccl/changeset/11565 is a breadcrumb. Other archaeology leads me to beleive that we're in this situation because (at one point at least) sigaltstack isn't (or wasn't) thread-local on Darwin.
Another test case:
On macOS Mojave (earlier macOS versions work as expected):
(thm (equal (append (append x y) x y x y x y x y)
(append x y x y x y x y x y)))
Unhandled exception 4 at 0x7fff77485b53, context->regs at #x70000b261590
? for help
[31147] Clozure CL kernel debugger:
This exception (4 is SIGILL) is because the call to sigreturn in the sigtramp routine returned unexpectedly, and there's helpfully an illegal instruction there to catch that unexpected case.
https://opensource.apple.com/release/macos-1014.html is now available (but they say "coming soon" for the sources for xnu-4903.201.2, which is probably what I really need to figure out what the third arg to sigreturn is.)
Tue Dec 11 10:11:28 GMT 2018
@xrme
bsd/dev/i386/unix_signal.c:
688:sigreturn(struct proc *p, struct sigreturn_args *uap, __unused int *retval)
for cross references: http://newosxbook.com/xxr/index.jl?q=sigreturn&ver=xnu-4903.221.2&case=false&def=false
tarball: https://opensource.apple.com/tarballs/xnu/xnu-4903.221.2.tar.gz
Given the resources available and the complexity of the issue, I'd prioritize getting it to work on Mojave, and deprecate earlier versions of MacOS. If there's time to get it running in earlier versions, that's great. But the most important thing is getting it working on Mojave.
I ran a 1.11.5 binary under a Mojave debug kernel. I got, as I feared I would, the following message:
process dx86cl64[405] sigreturn token mismatch: received 0x7ffeefbff120 expected 0xb993f3f80d520774
After this debug message is printed, the sigreturn system call returns with an error code.
The Mojave sources contain code to mitigate a class of attacks ("sigreturn oriented programming") described in, for example, https://dl.acm.org/citation.cfm?id=2650802. It seems that this mitigation breaks a technique that CCL has been using.
update: link to PDF of paper in question: https://www.cs.vu.nl/~herbertb/papers/srop_sp14.pdf
Thanks to some help from Apple DTS, I committed dd5622e9, and this really does seem to make CCL compatible with macOS Mojave.
dd5622e9da69edc48dbf97c6caa6f3a7e16f932b Seems to still work fine on OSX 10.9.5 in terminal mode and as a GUI app.
Are we going to see the Mac App Store version of Clozure CL updated soon? I'm waiting for that since my attempts to build it from this repos + comments haven't worked.
I just submitted an updated Mac App Store version of Clozure CL. It now has to get through app review. This seems to take about a week. It might take a little longer if the app review team identifies any issues that need to be corrected.
I'll post a note when it is approved (as I hope it will be).
On a beta machine, if I try:
I get a similar result trying to load quicklisp, although loading my own test file that just defines a simple function works fine.