IndySockets / Indy

Indy - Internet Direct
https://www.indyproject.org
434 stars 147 forks source link

TIdHTTPResponseInfo.WriteContent improperly uses IOHandler.Write method #499

Open Molochnik opened 10 months ago

Molochnik commented 10 months ago

TIdHTTPResponseInfo.WriteContent uses IOHandler.Write in two places and in both of them incorrectly. It causes wrong message encoding in Lazarus under Linux (unit IdCustomHTTPServer) Now it is (line 2188) :

FConnection.IOHandler.Write(ContentText, CharsetToEncoding(CharSet));

Should be:

FConnection.IOHandler.Write(ContentText, CharsetToEncoding(CharSet){$IFDEF STRING_IS_ANSI}, CharsetToEncoding(CharSet){$ENDIF});

Now it is (line 2213) :

FConnection.IOHandler.Write('<HTML><BODY><B>' + IntToStr(ResponseNo) + ' ' + ResponseText    {Do not Localize}
       + '</B></BODY></HTML>', CharsetToEncoding(CharSet));    {Do not Localize};

Should be:

FConnection.IOHandler.Write('<HTML><BODY><B>' + IntToStr(ResponseNo) + ' ' + ResponseText    {Do not Localize}
       + '</B></BODY></HTML>', CharsetToEncoding(CharSet){$IFDEF STRING_IS_ANSI}, CharsetToEncoding(CharSet){$ENDIF});    {Do not Localize};
rlebeau commented 10 months ago

This is not really a bug, IMHO.

In non-Unicode builds, IOHandler.Write() takes in an AnsiString, where the optional ASrcEncoding parameter specifies the encoding of that AnsiString. By default, Indy expects that AnsiString to use the OS's system encoding, which is how Delphi and FreePascal normally handle AnsiString.

The AnsiString data will be converted to Unicode using the ASrcEncoding, and then that Unicode will be converted to the AByteEncoding for transmission.

The TIdHTTPResponseInfo.Charset property is meant to specify the output encoding on the wire, not the input encoding of the AnsiString. This allows users to send, for example, a Latin-1 input string as UTF-8 output without having to encode it themselves, keeping the AnsiString is the compiler's native format.

If you want to send a ContentText string using an AnsiString with a different encoding than the OS, you should set the IOHandler.DefAnsiEncoding property, which the IOHandler will use when no ASrcEncoding is specified. For instance, if you have FreePascal/Lazarus setup to use UTF-8 encoded AnsiStrings, you can set the IOHandler.DefAnsiEncoding to IndyTextEncoding_UTF8, such as in the server's OnConnect event.

Molochnik commented 10 months ago

Thank you very much! That helped. Some notes: 1) FConnection is protected in TIdHTTPResponseInfo and I had to write a helper:

procedure TIdHTTPResponseInfoHelper.WriteContentUni;
begin
{$IFDEF STRING_IS_ANSI}
  FConnection.IOHandler.DefAnsiEncoding := IndyTextEncoding_UTF8;
{$ENDIF}
  WriteContent;
end;

2) In Lazarus under Linux I have both STRING_IS_UNICODE and STRING_IS_ANSI undefined. How is that possible?

rlebeau commented 10 months ago
  1. FConnection is protected in TIdHTTPResponseInfo and I had to write a helper:

Why not simply set the DefAnsiEncoding in the OnConnect event, like I suggested?

procedure TMyForm.IdHTTPServer1Connect(AContext: TIdContext);
begin
{$IFDEF STRING_IS_ANSI}
  AContext.Connection.IOHandler.DefAnsiEncoding := IndyTextEncoding_UTF8;
{$ENDIF}
end;

If you are going to write a custom wrapper, I would suggest allowing the caller configure the encoding as needed, eg:

procedure TIdHTTPResponseInfoHelper.WriteContentUni(
  {$IFDEF STRING_IS_ANSI}ASrcEncoding: IIdTextEncoding = nil{$ENDIF}
);
begin
{$IFDEF STRING_IS_ANSI}
  EnsureEncoding(ASrcEncoding, encUTF8);
  FConnection.IOHandler.DefAnsiEncoding := ASrcEncoding;
{$ENDIF}
  WriteContent;
end;
  1. In Lazarus under Linux I have both STRING_IS_UNICODE and STRING_IS_ANSI undefined. How is that possible?

Indy defines the STRING_IS_... conditionals in its IdCompilerDefines.inc file, did you {$I} that file into your code?

Otherwise, Indy uses FreePascal's own FPC_UNICODESTRINGS conditional to determine when to define STRING_IS_UNICODE, so you could use that same conditional if you don't need compatibility with Delphi, eg:

procedure TMyForm.IdHTTPServer1Connect(AContext: TIdContext);
begin
{$IFNDEF FPC_UNICODESTRINGS}
  AContext.Connection.IOHandler.DefAnsiEncoding := IndyTextEncoding_UTF8;
{$ENDIF}
end;

Or:

procedure TIdHTTPResponseInfoHelper.WriteContentUni(
  {$IFNDEF FPC_UNICODESTRING}ASrcEncoding: IIdTextEncoding = nil{$ENDIF}
);
begin
{$IFNDEF FPC_UNICODESTRING}
  EnsureEncoding(ASrcEncoding, encUTF8);
  FConnection.IOHandler.DefAnsiEncoding := ASrcEncoding;
{$ENDIF}
  WriteContent;
end;
Molochnik commented 9 months ago

Everything goes as usual - all bugs are yours, if you think you found a bug in an outer library go back to the start of the phrase. Thanks!