Closed magreenblatt closed 6 years ago
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
@amaitland CEF currently miss character set support completely. Resource handlers miss some standard filters which handle this (probably). Same with compressed response (it will not be handled).
@kdkdkd You can workaround this by injecting BOM in response (for cases when you have UTF8 or UTF16LE/BE).
Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).
Alex Maitland CEF currently miss character set support completely. Resource handlers miss some standard filters which handle this (probably). Same with compressed response (it will not be handled).
@dmitry-azaraev That's interesting to know, thanks. CefSharp
uses the BOM
approach.
Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).
You can workaround this by injecting BOM in response (for cases when you have UTF8 or UTF16LE/BE).
It is not the best solution in my situation, because, as you sad, it helps only with utf encoding. I want to handle every charset which chrome does.
For example, there is russian social network vk.com and it uses windows-1251 encding, it works fine for html content because of meta tag: . In that case cef successfully decodes page. But when ajax request is done and server returns non utf encoded text, cef fails to decode it. Thus vk.com is not usabe with CefResourceHandler right now(
I'm thinking about handle encoding by myself with icu or iconv or something similar, but than I need to parse web pages and check for meta tag to avoid double decoding(
Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).
You can change the default_encoding
on a per browser basis.
I believe you can even dynamically set this using preferences see
https://chromium.googlesource.com/chromium/src/+/ad2a3ded81c49ee89b44f6b544d21ff617c935bf/chrome/common/pref_names.cc#41 http://magpcss.org/ceforum/apidocs3/projects/%28default%29/CefRequestContext.html#SetPreference%28constCefString&,CefRefPtr%3CCefValue%3E,CefString&%29
Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).
You can change the default_encoding on a per browser basis.
I tried to do that but with no results, follwing code
#!c++
//Get context
CefRefPtr<CefRequestContext> Context = CefRequestContext::GetGlobalContext();
//Create and populate dictionary
CefRefPtr<CefValue> Value = CefValue::Create();
CefRefPtr<CefDictionaryValue> Dictionary = CefDictionaryValue::Create();
Dictionary->SetString("charset_default","windows-1251");
Value->SetDictionary(Dictionary);
//Modify context
CefString Error;
Context->SetPreference("intl",Value,Error);
std::cout<<std::endl<<Error.ToString()<<std::endl<<std::endl;
prints "Trying to modify an unregistered preference", while
#!c++
Context->GetAllPreferences(true)->GetDictionary("intl")->GetString("charset_default").ToString()
has value.
Same approach but with proxy.mode changes proxy settings.
Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).
I've had similar problems when trying to set properties using dictionaries, the dot notation is more reliable in my experience.
It'll be something like context->SetPreference("intl.charset_default", "windows-1251", error);
In OnAfterCreated
, I can change the preference, haven't checked anything with windows-1251
encoding though.
Just a reminder that you can only call SetPreference
on the CEF UI
thread.
Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).
Checked it out and dot notation works great, it even changes default encoding without need to restart browser.
Only one concern left: if SetPreference will work for different frames with different encodings, for example non-utf advertising iframe and utf main site content.
I'm afraid, that it can work from time to time, or there could be a race condition(
Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).
@amaitland Yes, and some sites does several charsets within same frame. For example, qq.com uses GB2312, gbk and utf-8. Thanks for help anyway.
For now I ended with solution, which detects charset based on http headers or meta tag, decodes page content to utf-8 and modifies charset in meta tag if needed. In other words forces every page to have utf-8 encoding.
I've attached source code if somebody else will need it.
But still waiting for native fix from CEF team.
Does Google Chrome properly handle different frames with different character encodings? Do you have a URL that demonstrates this?
Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).
Yes, Google Chrome handles that properly. I wrote simple example in node js, it gets properly rendered in Chrome and in CefClient, but it is impossible to render properly when using custom CefResourceHandler. There are 2 frames: one with utf-8 encoding and one with windows-1251 encoding, each frame don't have meta tag, but have Content-Type header. And there is no chance to display that properly with custom CefResourceHandler:
If I set encoding to utf-8, second frame will render in a wrong way.
If I set encoding to window-1251, first frame will render in a wrong way.
I don't have direct url to real world example, but if you login in vk.com and try to obtain group list. then part of data will be corrupted. Part of ajax requests gives data in windows-1251 encoding and another part in utf-8 and the only thing, whcih gives information about encoding is Content-Type which is not handleed properly whith custom CefResourceHandler.
Original comment by Mike Wiedenbauer (Bitbucket: shagkur, GitHub: shagkur).
GetCharset needs to be overriden in libcef/browser/net/resource_request_job.h/cc to let chromium handle the encoding properly. Here's the corresponding PR against master: https://bitbucket.org/chromiumembedded/cef/pull-requests/156
@shagkur Thanks, fixed in master revision 39ccd85 (bb), 3325 branch revision 29552e0 (bb) and 3282 branch revision a42c0ea (bb).
Original report by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).
What steps will reproduce the problem?
Need to override CefResourceHandler and send responce with non default encoding. Even if specify charset through Content-Type header, CEF won't recognize it and use default. Here is code example, which illustrates issue:
What is the expected output? What do you see instead?
I see browser window with following output :
And expected output is following:
What version of the product are you using? On what operating system?
I use 3.2623.1397.gaf139d7_windows32 on Windows 7 x64
Does the problem reproduce with the cefclient or cefsimple sample application at the same version? How about with a newer or older version?
No, it doesn't. Problem reproduces only with custom CefResourceHandler
Does the problem reproduce with Google Chrome at the same version? How about with a newer or older version?
No, it doesn't. Chrome always correctly treats Content-Type headers.