VBA-tools / VBA-Web

VBA-Web: Connect VBA, Excel, Access, and Office for Windows and Mac to web services and the web
http://vba-tools.github.io/VBA-Web/
MIT License
2.01k stars 494 forks source link

Add Unicode/UTF-8 support to UrlEncode and UrlDecode #175

Open parkone opened 8 years ago

parkone commented 8 years ago

Function URLEncode in the WebHelpers module do not convert correctly Unicode characters! When sending json with non-Unicode characters it will broken. I suggest to use this function instead

'iconv+urlencode /based on ADODB.Stream (include a reference to a recent version of the "Microsoft ActiveX Data Objects" library in your project)
'http://stackoverflow.com/questions/218181/how-can-i-url-encode-a-string-in-excel-vba
'https://msdn.microsoft.com/en-us/library/ms681424%28v=vs.85%29.aspx
Public Function URLEncode(StringVal As Variant, Optional SpaceAsPlus As Boolean = False) As String

  Dim bytes() As Byte, b As Byte, i As Integer, space As String

  If SpaceAsPlus Then space = "+" Else space = "%20"

  If Len(StringVal) > 0 Then
    With New ADODB.stream
      .mode = adModeReadWrite
      .Type = adTypeText
      .Charset = "UTF-8"
      .Open
      .WriteText StringVal
      .Position = 0
      .Type = adTypeBinary
      .Position = 3 ' skip BOM
      bytes = .Read
    End With

    ReDim Result(UBound(bytes)) As String

    For i = UBound(bytes) To 0 Step -1
      b = bytes(i)
      Select Case b
        Case 97 To 122, 65 To 90, 48 To 57, 45, 46, 95, 126
          Result(i) = Chr(b)
        Case 32
          Result(i) = space
        Case 0 To 15
          Result(i) = "%0" & Hex(b)
        Case Else
          Result(i) = "%" & Hex(b)
      End Select
    Next i

    URLEncode = Join(Result, "")
  End If
End Function
timhall commented 8 years ago

@parkone Thanks for raising this issue. This is a known issue and has I'm researching approaches of resolving this that maintain Mac support (unfortunately, ADODB is not available on Mac). I'll keep you posted.

bdr99 commented 8 years ago

Is this issue fixed by #205?

timhall commented 8 years ago

@bdr99 it should be, but I'll add a test case.

Sophist-UK commented 8 years ago

I have submitted another PR #213 to enhance UrlEncode further and to provide similar enhancements and fixes to UrlDecode also. This does NOT AFAIK provide full Unicode support - but it does handle ASCII characters above 128.

timhall commented 8 years ago

Found this reference implementation for Java: https://www.w3.org/International/URLUTF8Encoder.java

All non-ASCII characters are encoded in two steps: first

  • to a sequence of 2 or 3 bytes, using the UTF-8 algorithm;
  • secondly each of these bytes is encoded as "%xx".
// relevant lines
} else if (ch <= 0x007f) {      // other ASCII
       sbuf.append(hex[ch]);
} else if (ch <= 0x07FF) {      // non-ASCII <= 0x7FF
       sbuf.append(hex[0xc0 | (ch >> 6)]);
       sbuf.append(hex[0x80 | (ch & 0x3F)]);
} else {                    // 0x7FF < ch <= 0xFFFF
       sbuf.append(hex[0xe0 | (ch >> 12)]);
       sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);
       sbuf.append(hex[0x80 | (ch & 0x3F)]);
}
ghost commented 8 years ago

I don't know if you'll find this useful for the UTF8 decode, but I've been trying to decode URLs myself and after hours of searching I stumbled upon this: http://stackoverflow.com/questions/23810324/vba-convert-string-to-unicode You'd need to convert the consecutive encoded hex values to a byte array, and then you can get their proper value with the function that hamish provided in his answer. To convert an hex value to byte, you use val("&Hnn&"), where nn is the hex value without the percentage symbol (for instance, if I want to get the byte value of %84 then val("&H84&") will return 132).

Also, perhaps this bit here helps you convert text to an UTF8 byte array, perhaps if you combine it with the second part of parkone's answer you can achieve the encoding without the need of ADODB: http://www.di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html

justinmassiot commented 7 years ago

I still have issues with non-ASCII characters with 4.1.1 . In Excel 2016 (Windows 7) when I replace WebHelpers.UrlEncode from WebRequest.cls by WorksheetFunction.EncodeURL - source - everything works like a charm. I hope you could find a solution that would be portable (Win/Mac + compatible with older versions) and suitable for all character sets.

grv87 commented 5 years ago

@justinmassiot, WorksheetFunction.EncodeURL works for Office 2013+ only, so it's not an option.

https://github.com/VBA-tools/VBA-Web/blob/f400fbca66fc2cacf871bae2280853df948505ab/src/WebHelpers.bas#L997-L1007

The first approach is non-standard and not supported by some (most?) servers, see Wikipedia.

I've implemented the second approach (not pushed to GitHub yet), and it works like a charm, except that I haven't tested it with surrogates. It may require additional code. But I'm not sure this would be the best solution in terms of performance. Solution proposed by @Seeker320 at http://www.di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html and at #252 uses WinAPI WideCharToMultiByte function and may work faster. But I don't know an alternative external function for Mac. From Google it looks like mbsnrtowcs_l could do the job. However, I'm not sure how to pass UTF-16 locale to it. Another option is to use iconv but this adds additional dependency. I don't have a Mac to test. Is a developer with Mac experience here to test and decide?

Also, there are other places where UTF-8 support is required. The solution should be universal.

@timhall, what do you think? Should we go the hard way (with probably efficient external functions) or the simple way (pure VBA solution)?

grv87 commented 5 years ago

Also, I wonder whether VBA.StrConv(..., vbFromUnicode) calls in the code should actually use UTF-8 too.

gnefnuy commented 2 years ago

So, is it solved?