Cuis-Smalltalk / Cuis-Smalltalk-Dev

Active development of Cuis Smalltalk
MIT License
429 stars 69 forks source link

WebClient doesn't handle Unicode URLs #254

Closed JHonaker closed 1 year ago

JHonaker commented 1 year ago

I've discovered a bug with WebClient and SocketStream/SecureSocketStream. They don't handle UnicodeStrings properly when formatting the binary data. I know how to fix it for my use case, but I'm not sure if there's something more general that needs to be done to fix other issues that haven't surfaced yet.

I'll start with my original problem: I was trying to use PackageDownloader to download something from the standard list it's bundled with. I kept getting SSL error number -5. I double checked my installation of SSL and libssl, but SSL being the complicated beast that it is, left me confused and frustrated.

I used the debugger to explore and learn about the ways in which WebClient, SS/SSS, and SqueakSSL interact and tried to figure out where things were going wrong. I couldn't find anything wrong with my configuration, the package URL (PackageDownloader didn't work for any packages on any of my machines). At some point I decided to run print-it in a workspace while I was looking at another value in the debugger, and to my surprise (WebClient httpGet: 'https://the-url-i-was-trying.com') content printed the correct content without issue.

I'm sorry... What? I literally copied the URL from the debugger inspector.

I tried the download again. Another SSL error. I dropped into a debugger for the workspace version to try to figure out where they were diverging. After about an hour of inspecting parallel objects value by value, I was pulling my hair out after arriving all the way back to the only difference in the two stacks.

(WebClient httpGet: 'https://the-url-string.com') content.

and here, with the parameter url containing the same string.

(WebClient httpGet: url) content.

And then it hit me. UnicodeStrings were just introduced. Sockets transmit binary data. So I called class on the two and sure enough, the working version has a string of type String and the failing one has a string of type UnicodeString.

This has been a bit of a bear to debug. Like I said, I have a simple fix. Just call asByteString on the URL. Everything derived from the URL string, like the serverName and request text, ends up as a UnicodeString. It might be isolate-able to just converting the request text to a ByteString before it's sent.

I'm not familiar enough with the inner workings of everything to know how deep the rabbit hole goes. It was quite a 'pull the small thread and unravel your sweater' series of events. It did sell me on Smalltalk though. I'm very new to Smalltalk, like less than three weeks. I can't imaging how long this would have taken to track down on a system that wasn't so interactively inspect-able.

It feels like UnicodeString "pollution" could be bigger problem though. I'm happy to submit a PR or help track things down if you point me somewhere.

jvuletich commented 1 year ago

Hi John, Welcome to the Cuis community. I'm not sure that calling #asByteString is the right thing to do in the general case. So, I need something from you. I need to reproduce what you are seeing. What url are you trying to access? If you do (WebClient httpGet: url) content. what error do you get? Be as precise as possible, describe what you get in both cases, so anyone can help.

JHonaker commented 1 year ago

Hi John, Welcome to the Cuis community.

Thanks!

I'm not sure that calling #asByteString is the right thing to do in the general case.

Yes, I'm not sure either. I have a feeling that SecureSocketStream is correctly encoding the data as a Unicode-encoded binary blob, and the server is expecting ASCII encoded text. I don't know enough about Unicode to verify that that's correct, but it's definitely different than the one generated from a String based URL. Perhaps there's a way to signal it as Unicode in the header.

Down to business.

I need to reproduce what you are seeing.

Of course.

Steps to reproduce the error (on both OpenSUSE Tumbleweed and Windows 10)

  1. Follow the installation instructions for installing via source. (Clone the Cuis-Smalltalk-Dev repository, download the VM, etc.) I'm using the Cuis6.0-5753.image that's provided straight out of the box.
  2. Open the World menu and click Open > Package Downloader.
  3. Answer, Yes, you'd like to install Package Downloader.
  4. Try to download any package. (For the sake of completeness, I was trying to install Tools-Finder version 1.5 from hernanwilkinson's repo.)
  5. The transcript will display Downloading Downloading https://raw.githubusercontent.com/hernanwilkinson/cuis-finder-asWidget/master/Tools-Finder.pck.st ... and the Debugger will open showing Error: SSL connect failed with code: -5 - Process: Morphic UI - Priority: 40.

I can get a working response by doing (WebClient httpGet: ('https://raw.githubusercontent.com/hernanwilkinson/cuis-finder-asWidget/master/Tools-Finder.pck.st' asByteString)) content in a Workspace. This is a little bit farther down the call stack in the previously mentioned debugger session.

Some interesting Workspace examples and findings

There are four simple examples that point to some interesting (perhaps separate) issues that are easily illustrated with the Workspace. Trying to Ctrl-p these lines does the following:

(WebClient httpGet: 'https://www.google.com') content

properly prints the contents of google.com.

(WebClient httpGet: 'http://www.google.com') content "NB: HTTP vs HTTPS"

also properly prints the contents of google.com.

(WebClient httpGet: ('https://www.google.com' asUnicodeString)) content

invokes the same SSL connect (code -5) error as the Package Downloader.

(WebClient httpGet: ('http://www.google.com' asUnicodeString)) content "NB: HTTP vs HTTPS"

raises a MessageNotUnderstood: UnicodeString>>unescapePercents error when WebRequest tries to escape percents from the rawURL instance variable (because it is UnicodeString vs String).

jvuletich commented 1 year ago

Hi John. Thanks for the feedback. Based on this, the WebClient tests that were failing, and some experiments, I did some changes. Please pull the repo. Be sure to install all the updates, and reload the updated WebClient and Tests-WebClient packages. Test a bit. If you see further issues, please report them here or via email to the mail list. Thanks!

jvuletich commented 1 year ago

Assuming it went OK. Please reopen otherwise.

JHonaker commented 1 year ago

@jvuletich Sorry for the delay. I was in a very big rush at work, and didn't have time to come back to verify things. Everything works now. Thank you for the quick response, and sorry again for the classic report and disappear.