babashka / babashka.curl

A This library is mostly replaced by https://github.com/babashka/http-client
Eclipse Public License 1.0
121 stars 9 forks source link

Allow returning response as a map #6

Closed rymndhng closed 4 years ago

rymndhng commented 4 years ago

Fixes #5

Implements support for returning the curl response as a map, similar to clj-http and ring.

The implementation uses a deprecated method DataInputReader#readLine. From testing, it does not appear to properly parse unicode characters:

;; using deprecated method
user> (.readLine (java.io.DataInputStream. (clojure.java.io/input-stream (.getBytes "🙇‍♀️👏"))))
"🙇‍♀️👏"

;; using recommendation of a reader
user> (.readLine (clojure.java.io/reader (clojure.java.io/input-stream (.getBytes "🙇‍♀️👏"))))
"🙇‍♀️👏"

Fortunately, HTTP Messages, specifically Status Line & Headers are encoded in the ASCII range which minimizes the risk of improper conversion. See link.

This PR makes the conscious trade-off of using the deprecated method to avoid rewriting this snippet of Java code. See source.


Test Cases

(require '[clojure.java.io :as io])
(io/copy
  (:body
    (curl/get "https://github.com/borkdude/babashka/raw/master/logo/icon.png"
      {:response true :as :stream}))
  (io/file "icon.png"))
borkdude commented 4 years ago

@rymndhng Great work.

The implementation uses a deprecated method DataInputReader#readLine.

Can you elaborate why it uses this class?

In the docstring of this method I find:

Deprecated. 
This method does not properly convert bytes to characters. As of JDK 1.1, the preferred way to read lines of text is via the BufferedReader.readLine() method. Programs that use the DataInputStream class to read lines can be converted to use the BufferedReader class by replacing code of the form:
     DataInputStream d = new DataInputStream(in);

with:
     BufferedReader d
          = new BufferedReader(new InputStreamReader(in));
rymndhng commented 4 years ago

I decided to use DataInputStream because the .readLine does exactly what I needed. For HTTP parsing, specifically to read bytes until the presence of \r, \n or potentially both \r\n. On an incorrect match of \r\n, this data needs to be pushed back into the InputStream.

I read the source code of DataInputStream#readLine and discovered it does exactly what I needed. My understanding why the method does not properly convert bytes to characters is because it does not use charsets when converting the result into a string, i.e. unicode characters. Fortunately, this is not an issue for parsing HTTP Headers because the characters are constrained to the ASCII range.

This sample code demonstrates the issue round-tripping Unicode text using DataInputStream. Notice that using readLine

;; using deprecated method
user> (.readLine (java.io.DataInputStream. (clojure.java.io/input-stream (.getBytes "🙇‍♀️👏"))))
"�����"

;; using recommendation of a reader
user> (.readLine (java.io.BufferedReader. (java.io.InputStreamReader. (clojure.java.io/input-stream (.getBytes "🙇‍♀️👏")))))
"🙇‍♀️👏"

I considered alternative approaches to read lines of text, specifically with java.io.BufferedReader or line-seq. This approach is not ideal because

  1. Readers work on character streams instead of byte streams. This makes it impossible to consume binary data (such as the README example of downloading an image to a file).

  2. BufferedReader will buffer characters internally from the source, which means the underlying InputStream may have advanced past the current line. The underlying input-stream cannot be recovered for constructing the body.


It's great that you pointed this out, because I'm also unsatisfied by the use of a deprecated method.I've exhausted my ability to find the best solution in isolation and I wanted to get your feedback on the approach:

Should I continue exploring the solution space in this PR? or is this PR mergeable so long as the the goal of solving the deprecation issue in a follow up PR?

borkdude commented 4 years ago

@rymndhng Decided to re-implement it, I think it works. Can you take a look and merge it to your PR branch if OK?

https://github.com/rymndhng/babashka.curl/pull/1

rymndhng commented 4 years ago

Thanks for the help on converting read-line. I've never collaborated this way on Github before: having a maintainer work contribute improvements to a fork. That's neat!

Reminder: I applied this suggestion on the branch incase you want to follow up on it.

borkdude commented 4 years ago

@rymndhng Suggestion looked good. Thanks a lot!