Open parkone opened 8 years ago
@parkone Thanks for raising this issue. This is a known issue and has I'm researching approaches of resolving this that maintain Mac support (unfortunately, ADODB
is not available on Mac). I'll keep you posted.
Is this issue fixed by #205?
@bdr99 it should be, but I'll add a test case.
I have submitted another PR #213 to enhance UrlEncode further and to provide similar enhancements and fixes to UrlDecode also. This does NOT AFAIK provide full Unicode support - but it does handle ASCII characters above 128.
Found this reference implementation for Java: https://www.w3.org/International/URLUTF8Encoder.java
All non-ASCII characters are encoded in two steps: first
- to a sequence of 2 or 3 bytes, using the UTF-8 algorithm;
- secondly each of these bytes is encoded as "%xx".
// relevant lines
} else if (ch <= 0x007f) { // other ASCII
sbuf.append(hex[ch]);
} else if (ch <= 0x07FF) { // non-ASCII <= 0x7FF
sbuf.append(hex[0xc0 | (ch >> 6)]);
sbuf.append(hex[0x80 | (ch & 0x3F)]);
} else { // 0x7FF < ch <= 0xFFFF
sbuf.append(hex[0xe0 | (ch >> 12)]);
sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);
sbuf.append(hex[0x80 | (ch & 0x3F)]);
}
I don't know if you'll find this useful for the UTF8 decode, but I've been trying to decode URLs myself and after hours of searching I stumbled upon this: http://stackoverflow.com/questions/23810324/vba-convert-string-to-unicode You'd need to convert the consecutive encoded hex values to a byte array, and then you can get their proper value with the function that hamish provided in his answer. To convert an hex value to byte, you use val("&Hnn&"), where nn is the hex value without the percentage symbol (for instance, if I want to get the byte value of %84 then val("&H84&") will return 132).
Also, perhaps this bit here helps you convert text to an UTF8 byte array, perhaps if you combine it with the second part of parkone's answer you can achieve the encoding without the need of ADODB: http://www.di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html
I still have issues with non-ASCII characters with 4.1.1 . In Excel 2016 (Windows 7) when I replace WebHelpers.UrlEncode from WebRequest.cls by WorksheetFunction.EncodeURL - source - everything works like a charm. I hope you could find a solution that would be portable (Win/Mac + compatible with older versions) and suitable for all character sets.
@justinmassiot, WorksheetFunction.EncodeURL
works for Office 2013+ only, so it's not an option.
The first approach is non-standard and not supported by some (most?) servers, see Wikipedia.
I've implemented the second approach (not pushed to GitHub yet), and it works like a charm, except that I haven't tested it with surrogates. It may require additional code.
But I'm not sure this would be the best solution in terms of performance.
Solution proposed by @Seeker320 at http://www.di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html and at #252 uses WinAPI WideCharToMultiByte
function and may work faster.
But I don't know an alternative external function for Mac.
From Google it looks like mbsnrtowcs_l
could do the job. However, I'm not sure how to pass UTF-16 locale to it.
Another option is to use iconv
but this adds additional dependency.
I don't have a Mac to test. Is a developer with Mac experience here to test and decide?
Also, there are other places where UTF-8 support is required. The solution should be universal.
@timhall, what do you think? Should we go the hard way (with probably efficient external functions) or the simple way (pure VBA solution)?
Also, I wonder whether VBA.StrConv(..., vbFromUnicode)
calls in the code should actually use UTF-8 too.
So, is it solved?
Function URLEncode in the WebHelpers module do not convert correctly Unicode characters! When sending json with non-Unicode characters it will broken. I suggest to use this function instead