Closed rymndhng closed 4 years ago
@rymndhng Great work.
The implementation uses a deprecated method DataInputReader#readLine.
Can you elaborate why it uses this class?
In the docstring of this method I find:
Deprecated.
This method does not properly convert bytes to characters. As of JDK 1.1, the preferred way to read lines of text is via the BufferedReader.readLine() method. Programs that use the DataInputStream class to read lines can be converted to use the BufferedReader class by replacing code of the form:
DataInputStream d = new DataInputStream(in);
with:
BufferedReader d
= new BufferedReader(new InputStreamReader(in));
I decided to use DataInputStream
because the .readLine
does exactly what I needed. For HTTP parsing, specifically to read bytes until the presence of \r
, \n
or potentially both \r\n
. On an incorrect match of \r\n
, this data needs to be pushed back into the InputStream.
I read the source code of DataInputStream#readLine and discovered it does exactly what I needed. My understanding why the method does not properly convert bytes to characters is because it does not use charsets when converting the result into a string, i.e. unicode characters. Fortunately, this is not an issue for parsing HTTP Headers because the characters are constrained to the ASCII range.
This sample code demonstrates the issue round-tripping Unicode text using DataInputStream
. Notice that using readLine
;; using deprecated method
user> (.readLine (java.io.DataInputStream. (clojure.java.io/input-stream (.getBytes "🙇♀️👏"))))
"�����"
;; using recommendation of a reader
user> (.readLine (java.io.BufferedReader. (java.io.InputStreamReader. (clojure.java.io/input-stream (.getBytes "🙇♀️👏")))))
"🙇♀️👏"
I considered alternative approaches to read lines of text, specifically with java.io.BufferedReader
or line-seq
. This approach is not ideal because
Readers work on character streams instead of byte streams. This makes it impossible to consume binary data (such as the README example of downloading an image to a file).
BufferedReader will buffer characters internally from the source, which means the underlying InputStream
may have advanced past the current line. The underlying input-stream
cannot be recovered for constructing the body.
It's great that you pointed this out, because I'm also unsatisfied by the use of a deprecated method.I've exhausted my ability to find the best solution in isolation and I wanted to get your feedback on the approach:
Should I continue exploring the solution space in this PR? or is this PR mergeable so long as the the goal of solving the deprecation issue in a follow up PR?
@rymndhng Decided to re-implement it, I think it works. Can you take a look and merge it to your PR branch if OK?
Thanks for the help on converting read-line. I've never collaborated this way on Github before: having a maintainer work contribute improvements to a fork. That's neat!
Reminder: I applied this suggestion on the branch incase you want to follow up on it.
@rymndhng Suggestion looked good. Thanks a lot!
Fixes #5
Implements support for returning the curl response as a map, similar to clj-http and ring.
The implementation uses a deprecated method DataInputReader#readLine. From testing, it does not appear to properly parse unicode characters:
Fortunately, HTTP Messages, specifically Status Line & Headers are encoded in the ASCII range which minimizes the risk of improper conversion. See link.
This PR makes the conscious trade-off of using the deprecated method to avoid rewriting this snippet of Java code. See source.
Test Cases