edicl / drakma

HTTP client written in Common Lisp
http://edicl.github.io/drakma/
248 stars 58 forks source link

ssl3_read_n:unexpected eof while reading #137

Closed reflektoin closed 11 months ago

reflektoin commented 12 months ago

Edit: Filed the issue by accident prematurely so not really a descriptive issue name...

I got this error while trying to download a file

A failure in the SSL library occurred on handle #.(SB-SYS:INT-SAP #X7F428C21C5C0) (SSL_get_error: 1). ERR_print_errors(): 40764390427F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:../ssl/record/rec_layer_s3.c:308:

Here's the code I used to download the file

(with-open-file (file "espoo_ostot_2012.xlsx"
                      :direction :output
                      :if-does-not-exist :create
                      :if-exists :supersede
                      :element-type '(unsigned-byte 8))
(let ((cl+ssl:*default-unwrap-stream-p* t))
  (let ((input (drakma:http-request "https://stakohaavoindata.azurewebsites.net/avoindata/Espoo_kaupungin_ostolaskut__2012.xlsx"
                                    :want-stream t)))
    (arnesi:awhile (read-byte input nil nil)
      (write-byte arnesi:it file))
    (close input))))

the code depends on arnesi so first I had to download it with quicklisp

(ql:quickload :arnesi)

Another way to reproduce the error is with trivial download:

(let ((cl+ssl:*default-unwrap-stream-p* nil))
(trivial-download:download 
 "https://stakohaavoindata.azurewebsites.net/avoindata/Espoo_kaupungin_ostolaskut__2012.xlsx" 
 "espoo_ostot_2012.xlsx" :quiet t))

I tried adding the

*default-unwrap-stream-p* nil

based on the discussion about similar error here: https://github.com/cl-plus-ssl/cl-plus-ssl/issues/166, but that didn't solve the issue. The variable above was also mentioned here: https://github.com/edicl/drakma/pull/120

I'm using Linux Mint, running SBCL 2.1.11.debian, Emacs, SLIME

What might be next steps toward fixing this issue?

Thank you

reflektoin commented 11 months ago

I'm not sure how to proceed. I tested downloading the file with curl and it works. Below are some notes about that.

Using curl verbose the download works curl -v -o testi.xlsx https://stakohaavoindata.azurewebsites.net/avoindata/Espoo_kaupungin_ostolaskut__2012.xlsx

This is the output of the last few lines of the curl command:

{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
100 25.8M  100 25.8M    0     0  6001k      0  0:00:04  0:00:04 --:--:-- 6324k
* Connection #0 to host stakohaavoindata.azurewebsites.net left intact

filesizes are almost identical at the point of error in the lisp program

27082752 Sep 30 09:14 espoo_ostot_2012.xlsx
27085896 Sep 30 08:57 testi.xlsx
reflektoin commented 11 months ago

Based on @avodonosov's comment here I managed to get a working version. The working version is shown in the linked issue after avodonosov's comment.

avodonosov commented 11 months ago

@reflektoin, I created https://github.com/edicl/drakma/issues/138 to discuss correct reading of response from a stream.

Your current solution only works for servers returning Content-Lengths in the response, but not for chunked encoding of the response.

As you can see in the issue I created, the most complex situation of stream reading is when response compression is used.

If you don't allow response compression (i.e. do not specify :additional-headers '(("Accept-Encoding" . "gzip")) or similar), and want to support chunked encoding, the logic can be relatively simple:

If Content-Length response header is present, read this number of bytes. Otherwise assume chunked encoding which is handled automatically by the stream returned by drakma, so simply read from the stream till EOF. Note, you should stop at first EOF and don't repeat reading calls after that, because the streams drakma uses for chunked data, after reporting EOF at the end of chunking, switch to a different mode - simply passing reading calls to underlying stream. So repeated reading calls after EOF will result in attempt to read more from the network.

Also, you could probably optimize you approach by not copying byte by byte, but instead using some buffer array, and copying with read-sequence / write-sequence.

reflektoin commented 11 months ago

Thanks for the implementation tips and elaborating on the issue.

avodonosov commented 11 months ago

@reflektoin, note also in the docs:

The stream returned is a flexi-stream with a chunked stream as its underlying stream. If you want to read binary data from this stream, read from the underlying stream which you can get with FLEXI-STREAM-STREAM.

https://edicl.github.io/drakma/#want-stream

So you may want to use the underlying stream. (I am not sure what difference does it make, if any, for your case, but since the doc suggests... At least some performance win, by avoiding an itermediate stream calls)