Clozure / ccl

Clozure Common Lisp
http://ccl.clozure.com
Apache License 2.0
850 stars 103 forks source link

Suppress tcp-stream garbage referring to a closed fd. #362

Closed 5ch-cl-thread-882 closed 2 months ago

5ch-cl-thread-882 commented 3 years ago

On errornous situation, make-tcp-socket function seems to make a temporal tcp-stream object referring a closed FD. I saw some strange crashes probably because of it.

This PR tries to release the objects correctly.


Consider make-tcp-socket failed by 'Connection Refused'. In this situation, %socket-connect raises an error. If I selected to abort, stack unwinds, and the FD will be closed by the unwind-protected form. But the socket object made by this function still exists in memory and continues to refer the closed FD. After that, I sometimes see CCL randomly crashes at processing I/Os.

Here is a case going to crash I found on my machine (MacOS High Sierra 10.13.6).

$ ./dx86cl64
Clozure Common Lisp Version 1.12 (v1.12-39-g6c1a9458) DarwinX8664

For more information about CCL, please see http://ccl.clozure.com.

CCL is free software.  It is distributed under the terms of the Apache
Licence, Version 2.0.
? (machine-version)
"MacBookAir6,2"
? (ql:quickload "usocket")
To load "usocket":
  Load 1 ASDF system:
    usocket
; Loading "usocket"

("usocket")
? (usocket:socket-connect "ja.wikipedia.org" 65500) ;; Fails. Go to abort.
> Error: Error #<USOCKET:CONNECTION-REFUSED-ERROR #x302000AC95BD>
> While executing: USOCKET::RAISE-ERROR-FROM-ID, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > :pop

? (usocket:socket-connect "nicovideo.jp" 443)
#<USOCKET:STREAM-USOCKET #x302000AC30AD>
? (ql:quickload "hunchentoot")  ;; It doesn't seem to matter which library you use.
To load "hunchentoot":
  Load 1 ASDF system:
    hunchentoot
Killed: 9

MacOS's crash report indicated close caused this crash:

...(omitted)...

Exception Type:        EXC_GUARD
Exception Codes:       0x4000000100000007, 0x2c548ffaf07016e3
Exception Subtype:     GUARD_TYPE_FD, id=0x2c548ffaf07016e3, fd=7, flavor=0x00000001 (CLOSE)

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib          0x00007fff7fd3d4fa close + 10
1   dx86cl64                        0x000000000001b223 SPffcall + 99 (x86-spentry64.s:4282)

Thread 1:
0   libsystem_kernel.dylib          0x00007fff7fd3320a mach_msg_trap + 10
1   libsystem_kernel.dylib          0x00007fff7fd32724 mach_msg + 60
2   libsystem_kernel.dylib          0x00007fff7fd32f64 mach_msg_server + 369
3   dx86cl64                        0x000000000002d031 exception_handler_proc + 49 (x86-exceptions.c:3552)
4   libsystem_pthread.dylib         0x00007fff7ff04661 _pthread_body + 340
5   libsystem_pthread.dylib         0x00007fff7ff0450d _pthread_start + 377
6   libsystem_pthread.dylib         0x00007fff7ff03bf9 thread_start + 13

Thread 2:
0   libsystem_kernel.dylib          0x00007fff7fd3e142 read + 10
1   dx86cl64                        0x000000000001b223 SPffcall + 99 (x86-spentry64.s:4282)

...(omitted)...

So I tried to take a trace around ccl::fd-close. I got below:

0> Calling (CCL::FD-CLOSE 7)
 (647B48) : 0 (PRINT-CALL-HISTORY :CONTEXT NIL :PROCESS NIL :ORIGIN NIL :DETAILED-P NIL :COUNT 1152921504606846975 :START-FRAME-NUMBER 0 :STREAM #<SYNONYM-STREAM to *TERMINAL-IO* #x30200047792D> :PRINT-LEVEL 2 :PRINT-LENGTH 5 :PRINT-STRING-LENGTH :DEFAULT :SHOW-INTERNAL-FRAMES NIL :FORMAT :TRADITIONAL) 957
 (647CE0) : 1 (FUNCALL #'#<(CCL::TRACED CCL::FD-CLOSE)> 7) 1317
 (647D60) : 2 (%%IOBLOCK-CLOSE #S(CCL::IOBLOCK :STREAM #<BASIC-TCP-STREAM :CLOSED #x302000A0A9CD> :UNTYI-CHAR NIL ...)) 253
 (647D88) : 3 (%%BEFORE-AND-AFTER-COMBINED-METHOD-DCODE (NIL #<CCL::STANDARD-KERNEL-METHOD CLOSE :AFTER #> . 823242)) 1261
 (647DF8) : 4 (%%STANDARD-COMBINED-METHOD-DCODE ((#<#>) (#<#>) #<CCL::STANDARD-KERNEL-METHOD CLOSE #>) 823242) 237
 (647E70) : 5 (%%CHECK-KEYWORDS #(1 #(:ABORT) #<Combined-Method CLOSE #x302000475ACF>) 823255) 261
 (647ED8) : 6 (DRAIN-TERMINATION-QUEUE) 493
 (647F48) : 7 (FUNCALL #'#<(:INTERNAL CCL::ADD-GC-HOOK)>) 85
 (647F68) : 8 (HOUSEKEEPING) 37
 (647F78) : 9 (HOUSEKEEPING-LOOP) 501
 (647FB8) : 10 (Killed: 9

I think, when the GC collects the temporal socket object, GC tries to close the already closed FD, and strangely go to crash...?

I found resembled codes in make-udp-socket and make-stream-file-socket, so I applied same fixes to them.


This problem is repored at a thread in 5ch.net (Japanese BBS) at first. I saw the discussions in this thread and made this PR.

xrme commented 2 months ago

It looks like this PR is about the same (or at least a similar) bug that was reported in #504 and fixed in https://github.com/Clozure/ccl/commit/35eb25749ebbfce0c73e9f48b0c041512d771d7f

I wish I would have paid attention to this PR three years ago.

5ch-cl-thread-882 commented 2 months ago

Thanks for referring this PR. (I have forgotten about it, too.) As you say, the commit 35eb25749ebbfce0c73e9f48b0c041512d771d7f would also solve the problem I found. Thank you.

By the way, looking at this PR, it seems that I found connect in make-udp-socket similar to make-tcp-socket and took the same measures as it. However, now I think about it, connect to UDP sockets only determines the default destination, so even if it fails, the FD should not become unusable, unlike TCP. Therefore, does this mean that nothing needs to be done about UDP sockets?

xrme commented 2 months ago

I think that you are right that we should re-examine the udp socket and file socket cases also. I'm working on that in #507.

5ch-cl-thread-882 commented 2 months ago

Ok, thank you. I'll subscribe #507 and comment if I found something.

xrme commented 2 months ago

I merged #507 and I believe the issue you found in this PR is now fixed.

Again, I'm sorry I didn't pay attention to your PR sooner. My poor excuse is that I wasn't working professionally on CCL at the time you made the PR.