chromiumembedded / cef

Chromium Embedded Framework (CEF). A simple framework for embedding Chromium-based browsers in other applications.
https://bitbucket.org/chromiumembedded/cef/
Other
3.33k stars 466 forks source link

CEF ignores Content-Type charset option when using custom CefResourceHandler #1906

Closed magreenblatt closed 6 years ago

magreenblatt commented 8 years ago

Original report by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


What steps will reproduce the problem?

Need to override CefResourceHandler and send responce with non default encoding. Even if specify charset through Content-Type header, CEF won't recognize it and use default. Here is code example, which illustrates issue:

#!c++

#include "include/cef_app.h"
#include "include/cef_client.h"
#include <stdio.h>

//ResourceHandler which main task is to serve static content
class  MyResourceHandler : public CefResourceHandler
{
    //This works fine
    //std::string Responce = "Hello";

    //This works fine to, because string is encoded in utf-8, and default encoding is utf-8
    //std::string Responce = "\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82";

    //This doesn't work, beacause string is encoded in windows-1251 encoding, even if it is specified correct encoding in headers.
    std::string Responce = "\xCF\xF0\xE8\xE2\xE5\xF2";

    //This works, because encoding is set by meta
    //std::string Responce = "<meta charset='windows-1251'/>\xCF\xF0\xE8\xE2\xE5\xF2";

public:

    bool ProcessRequest(CefRefPtr<CefRequest> request, CefRefPtr<CefCallback> callback)
    {
        callback->Continue();
        return true;
    }

    void GetResponseHeaders(CefRefPtr<CefResponse> response, int64& response_length, CefString& redirectUrl)
    {
        CefResponse::HeaderMap HeaderMapData;
        // !!! Charset is ignored !!!
        HeaderMapData.insert(std::pair<CefString, CefString>("Content-Type","text/html;charset=windows-1251"));
        response->SetHeaderMap(HeaderMapData);

        response->SetMimeType("text/html");

        response->SetStatus(200);
        response_length = -1;
    }

    bool ReadResponse(void* data_out,int bytes_to_read,int& bytes_read,CefRefPtr<CefCallback> callback)
    {
        if(Responce.empty())
            return false;

        memcpy(data_out,Responce.data(),Responce.size());
        bytes_read = Responce.size();
        Responce.clear();
        return true;
    }

    bool CanGetCookie(const CefCookie& cookie) { return true; }

    bool CanSetCookie(const CefCookie& cookie) { return true; }

    void Cancel(){}

    private:
        IMPLEMENT_REFCOUNTING(MyResourceHandler);
};

//Standart application
class MyCefApp: public CefApp
{
private:
    IMPLEMENT_REFCOUNTING(MyCefApp);
};

//Cef client, which routes all requests to MyResourceHandler
class MyHandler : public CefClient, public CefRequestHandler
{
    CefRefPtr<CefRequestHandler> GetRequestHandler()
    {
        return this;
    }
    CefRefPtr<CefResourceHandler> GetResourceHandler(CefRefPtr<CefBrowser> browser, CefRefPtr<CefFrame> frame, CefRefPtr<CefRequest> request)
    {
        return new MyResourceHandler();
    }

private:
    IMPLEMENT_REFCOUNTING(MyHandler);
};

int main()
{

    //Initialize main classes
    CefMainArgs main_args;
    CefRefPtr<CefApp> App = new MyCefApp();
    CefRefPtr<MyHandler> Handler = new MyHandler();
    CefExecuteProcess(main_args, App, NULL);
    CefSettings GlobalSettings;
    CefInitialize(main_args, GlobalSettings, App, NULL);

    //Create browser
    CefWindowInfo window_info;
    window_info.SetAsPopup(0,"");
    CefBrowserSettings browser_settings;
    //Set utf-8 as default encoding
    std::wstring wencoding = L"utf-8";
    cef_string_utf16_set(wencoding.data(),wencoding.size(),&browser_settings.default_encoding,true);
    CefRefPtr<CefBrowser> Browser = CefBrowserHost::CreateBrowserSync(window_info, Handler, "google.com", browser_settings, 0);

    //Infinite message loop
    while(true)
    {
        CefDoMessageLoopWork();
    }

  return 0;
}

What is the expected output? What do you see instead?

I see browser window with following output :

WrongEncoding.png

And expected output is following:

GoodEncoding.png

What version of the product are you using? On what operating system?

I use 3.2623.1397.gaf139d7_windows32 on Windows 7 x64

Does the problem reproduce with the cefclient or cefsimple sample application at the same version? How about with a newer or older version?

No, it doesn't. Problem reproduces only with custom CefResourceHandler

Does the problem reproduce with Google Chrome at the same version? How about with a newer or older version?

No, it doesn't. Chrome always correctly treats Content-Type headers.

magreenblatt commented 8 years ago

Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).


Where's your call to response->SetHeaderMap(HeaderMapData);???

magreenblatt commented 8 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


Yes, it was missing, I've updated example, the issue still reproduces(

magreenblatt commented 8 years ago

Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).


@amaitland CEF currently miss character set support completely. Resource handlers miss some standard filters which handle this (probably). Same with compressed response (it will not be handled).

@kdkdkd You can workaround this by injecting BOM in response (for cases when you have UTF8 or UTF16LE/BE).

magreenblatt commented 8 years ago

Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).


Alex Maitland CEF currently miss character set support completely. Resource handlers miss some standard filters which handle this (probably). Same with compressed response (it will not be handled).

@dmitry-azaraev That's interesting to know, thanks. CefSharp uses the BOM approach.

magreenblatt commented 8 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


You can workaround this by injecting BOM in response (for cases when you have UTF8 or UTF16LE/BE).

It is not the best solution in my situation, because, as you sad, it helps only with utf encoding. I want to handle every charset which chrome does.

For example, there is russian social network vk.com and it uses windows-1251 encding, it works fine for html content because of meta tag: . In that case cef successfully decodes page. But when ajax request is done and server returns non utf encoded text, cef fails to decode it. Thus vk.com is not usabe with CefResourceHandler right now(

I'm thinking about handle encoding by myself with icu or iconv or something similar, but than I need to parse web pages and check for meta tag to avoid double decoding(

magreenblatt commented 8 years ago

Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).


You can change the default_encoding on a per browser basis.

http://magpcss.org/ceforum/apidocs3/projects/%28default%29/_cef_browser_settings_t.html#default_encoding

I believe you can even dynamically set this using preferences see

https://chromium.googlesource.com/chromium/src/+/ad2a3ded81c49ee89b44f6b544d21ff617c935bf/chrome/common/pref_names.cc#41 http://magpcss.org/ceforum/apidocs3/projects/%28default%29/CefRequestContext.html#SetPreference%28constCefString&,CefRefPtr%3CCefValue%3E,CefString&%29

magreenblatt commented 8 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


You can change the default_encoding on a per browser basis.

I tried to do that but with no results, follwing code

#!c++

//Get context
CefRefPtr<CefRequestContext> Context = CefRequestContext::GetGlobalContext();

//Create and populate dictionary
CefRefPtr<CefValue> Value = CefValue::Create();
CefRefPtr<CefDictionaryValue> Dictionary = CefDictionaryValue::Create();
Dictionary->SetString("charset_default","windows-1251");
Value->SetDictionary(Dictionary);

//Modify context
CefString Error;
Context->SetPreference("intl",Value,Error);
std::cout<<std::endl<<Error.ToString()<<std::endl<<std::endl;

prints "Trying to modify an unregistered preference", while

#!c++

Context->GetAllPreferences(true)->GetDictionary("intl")->GetString("charset_default").ToString()

has value.

Same approach but with proxy.mode changes proxy settings.

magreenblatt commented 8 years ago

Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).


I've had similar problems when trying to set properties using dictionaries, the dot notation is more reliable in my experience.

It'll be something like context->SetPreference("intl.charset_default", "windows-1251", error);

In OnAfterCreated, I can change the preference, haven't checked anything with windows-1251 encoding though.

Just a reminder that you can only call SetPreference on the CEF UI thread.

magreenblatt commented 8 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


Checked it out and dot notation works great, it even changes default encoding without need to restart browser.

Only one concern left: if SetPreference will work for different frames with different encodings, for example non-utf advertising iframe and utf main site content.

I'm afraid, that it can work from time to time, or there could be a race condition(

magreenblatt commented 8 years ago

Original comment by amaitland (Bitbucket: amaitland, GitHub: amaitland).


Unfortunately I don't believe you can specify a preference at a frame level, it's at a CefRequestContext level, which you can use to isolate CefBrowser instances.

magreenblatt commented 8 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


magreenblatt commented 8 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


@amaitland Yes, and some sites does several charsets within same frame. For example, qq.com uses GB2312, gbk and utf-8. Thanks for help anyway.

For now I ended with solution, which detects charset based on http headers or meta tag, decodes page content to utf-8 and modifies charset in meta tag if needed. In other words forces every page to have utf-8 encoding.

I've attached source code if somebody else will need it.

But still waiting for native fix from CEF team.

magreenblatt commented 7 years ago

Does Google Chrome properly handle different frames with different character encodings? Do you have a URL that demonstrates this?

magreenblatt commented 7 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


Yes, Google Chrome handles that properly. I wrote simple example in node js, it gets properly rendered in Chrome and in CefClient, but it is impossible to render properly when using custom CefResourceHandler. There are 2 frames: one with utf-8 encoding and one with windows-1251 encoding, each frame don't have meta tag, but have Content-Type header. And there is no chance to display that properly with custom CefResourceHandler:

If I set encoding to utf-8, second frame will render in a wrong way.

If I set encoding to window-1251, first frame will render in a wrong way.

I don't have direct url to real world example, but if you login in vk.com and try to obtain group list. then part of data will be corrupted. Part of ajax requests gives data in windows-1251 encoding and another part in utf-8 and the only thing, whcih gives information about encoding is Content-Type which is not handleed properly whith custom CefResourceHandler.

magreenblatt commented 7 years ago

Original comment by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


node js 2 frames example

magreenblatt commented 6 years ago

Original comment by Mike Wiedenbauer (Bitbucket: shagkur, GitHub: shagkur).


GetCharset needs to be overriden in libcef/browser/net/resource_request_job.h/cc to let chromium handle the encoding properly. Here's the corresponding PR against master: https://bitbucket.org/chromiumembedded/cef/pull-requests/156

magreenblatt commented 6 years ago

@shagkur Thanks, fixed in master revision 39ccd85 (bb), 3325 branch revision 29552e0 (bb) and 3282 branch revision a42c0ea (bb).

magreenblatt commented 8 years ago

Original changes by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


magreenblatt commented 8 years ago

Original changes by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


magreenblatt commented 7 years ago

Original changes by BrowserAutomationStudio (Bitbucket: kdkdkd, GitHub: kdkdkd).


magreenblatt commented 7 years ago
magreenblatt commented 6 years ago