Encoding - Githubissues

GoogleCodeExporter commented 9 years ago

I noticed that HtmlViewer supported encodings: UTF-xxx, Windows-xxx, but there 
is no such encoding as the KOI-8xxx, ISO-xxx. Do you plan to support these 
encoding? If it is not planned, tell me where to insert the code I need 
re-encoding?

???

Original issue reported on code.google.com by SchwarzK...@yandex.ru on 8 Feb 2012 at 5:09

GoogleCodeExporter commented 9 years ago

In HtmlBuffer.pas you can find a long list of supported character sets. Lots of 
ISO char sets are supported simply by their numbers. 

There are names for some russian sets as well, but it looks like they're all 
synonyms for the one russian char set that Windows supports. 

If you'd like to add some KOI translations, please join us at 
https://github.com/BerndGabriel/HtmlViewer

We should think about a charset-"plugin" for THtmlBuffer.

OrphanCat

Original comment by OrphanCat on 8 Feb 2012 at 8:46

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

OK, try to see if maybe something good will...

Original comment by SchwarzK...@yandex.ru on 8 Feb 2012 at 9:00

GoogleCodeExporter commented 9 years ago

Original comment by OrphanCat on 24 Feb 2012 at 4:57

Changed state: WaitingForYouToVolunteer
Added labels: Priority-Low
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

So, after some hesitation, as promised, has added support for some encodings.
Some notes and comments on the original code.
1. In the original code DOES NOT WORK conversion from ISO-2022-JP.
2. I tried to get rid of the functions of Windows MultiByteToWideChar.
3. I was not able to embed the code conversion from: 932, 943, iso-2022-jp, 
iso-2022-jp-1, iso-2022-cn, iso-2022-kr.
4. For some Asian languages that use two bits per symbol, if found 
missing in the language of symbol, the following characters are not decoded.
5. Used a common encoding.
6. Maybe something to add too much, something is not taken into account, was 
mistaken for embedding conversion method.

If you are satisfied with this method, you can include it in HtmlViewer. You 
can make changes and corrections.

P.S. I hope the translator translated correctly. :)

Original comment by SchwarzK...@yandex.ru on 17 Apr 2012 at 5:56

GoogleCodeExporter commented 9 years ago

Forgot to mention: unable embed code conversion from UTF-7.

Original comment by SchwarzK...@yandex.ru on 17 Apr 2012 at 5:59

GoogleCodeExporter commented 9 years ago

Thanks for all the hours of work on this large conversion package.
If I understand the file headers right, you converted the libiconv code to 
Delphi?

For testing where can I get example files for the various codepages/charsets? 
Could you please post (links to) example htmls? 

Did you notice that I fixed HtmlBuffer.pas (issue 139) in the meantime? Code 
pages 932, 936, 949, 950 and ISO-2022-JP are translated correctly since 
revisions r257/r258 (March 15/21).

If "1. In the original code DOES NOT WORK conversion from ISO-2022-JP." means 
that revision r258 of HtmlBuffer.pas still fails, please send an example html 
for further testing.

Thank you again
OrphanCat

Original comment by OrphanCat on 17 Apr 2012 at 8:36

Changed state: Started

GoogleCodeExporter commented 9 years ago

Well, I will prepare for the test page.
I moved to Delphi only some Asian languages with libiconv (the best of 
their knowledge of C + +).
I used the latest revision HtmlBuffer.pas, but the page on the ISO-2022-JP is 
not able to decode.
If you have suggestions of cocoa on a different encoding - to try to help.

Original comment by SchwarzK...@yandex.ru on 17 Apr 2012 at 8:45

GoogleCodeExporter commented 9 years ago

I will take a week for meditations...

Original comment by SchwarzK...@yandex.ru on 19 Apr 2012 at 11:05

GoogleCodeExporter commented 9 years ago

Prompt.
I can not understand.
Function "function TBuffer.GetNext: Word;" shifts the position of the reader.
How do I know the next character without moving the position of reading???

Original comment by SchwarzK...@yandex.ru on 24 Apr 2012 at 10:47

GoogleCodeExporter commented 9 years ago

Why do you think you need it? All multibytecharsets I've seen in the past 
months were build in a way, that the current byte tells you, whether you need 
another byte to complete the character or not.

Original comment by OrphanCat on 24 Apr 2012 at 10:54

GoogleCodeExporter commented 9 years ago

Probably not necessary, think about it.
And now for verification.

Original comment by SchwarzK...@yandex.ru on 24 Apr 2012 at 11:01

GoogleCodeExporter commented 9 years ago

When transcoding to UTF-7, was faced with:
In UTF-7 string "<>" looks like this: "+ADwAPg-".
After conversion of the first character code refers to the "procedure 
THtmlParser.GetCh;" and "function TBuffer.PeekChar: TBuffChar;" and the 
compiler finds that go beyond comment. As a result, the result of conversion is 
not displayed correctly (image). But if you copy everything looks correct:

<!--StartFragment--><>,.[{]} <br />
>,.[{]} <br />
ABCDEFGHIJKLMOPQRSTUVXYZ<!--EndFragment-->

As you can specify that a symbol "<" and not comment???

Something had to give extra???

Original comment by SchwarzK...@yandex.ru on 26 Apr 2012 at 11:51

Attachments:

11.png

GoogleCodeExporter commented 9 years ago

If I understand the differences right then "+ADwAPg-," ist translated by your 
UTF-7 single character translator to the first 3 chars in the above image 
11.png, while MultiByteToWideChar() used in CopyToClipboard translates 
correctly.

I cannot see a chance for THtmlParser.GetCh to misunderstand a character unless 
your UTF-7 extension in TBuffer.NextChar does not swallow the trailing '-'. 
Notice that you might have to remember the UTF-7 state 
'in-base64-encoded-block'. (Oops, the same is valid for the FJis state. I 
committed the fix in r284). And actually the forgotten state could be a reason, 
why TBuffer returns the '-' with the next NextChar.

However IMO image 11.png does not show the result of a detected comment, but 
the result of a defect UTF-7 conversion.

Original comment by OrphanCat on 28 Apr 2012 at 12:33

GoogleCodeExporter commented 9 years ago

Today I will lay out his version of conversion, see what's wrong.

Original comment by SchwarzK...@yandex.ru on 28 Apr 2012 at 12:43

GoogleCodeExporter commented 9 years ago

So.
In my opinion this is the final version. You can use it on your own. For 
myself, I'm already using. :)

Fixed:
1. Thus, as originally modules used to convert strings, I did not realize that 
the sign of the end of the line in HtmlViewer is a symbol of "$0". Fixed a 
problem reading the end of the character, if found missing character encoding.
2. Encoding "KOI8-T" does not have a digital equivalent. She is set to "-5".
3. Minor fixes.

Posted:
1. Added aliases known encodings.
2. Added support for some encodings.
3. Added forced recoding encodings 1250...1258.

Since the introduction in the source file "HtmlBuffer.pas" stopped working 
recoding of "ISO-2022-JP", and "EUC-JP". This can be seen in the attached 
examples.

As for the examples to validate the conversion. It is difficult to find a real 
page, for example, "CP866". It is mainly used for these purposes "UTF-8".
To create a page in the national character set, I used the recognized library 
"iconv". As a sample taken from the characters "Sample.txt" and re-encode, for 
example, from Win-1251 in EUC-KR. It turns out the original page. When it 
opened in HTMLViewer can judge the correctness of the conversion.
I think that's enough.
In the folder "Add" on real pages found.

Until all the... :)

If there is any need for a different encoding - to try to help.

Original comment by SchwarzK...@yandex.ru on 28 Apr 2012 at 8:00

GoogleCodeExporter commented 9 years ago

A slight modification.
1. Added the digital equivalent of the codepage.
2. Returned to auto-detect encoding "iso-2022-jp", added auto-detect 
"iso-2022-jp-1", "iso-2022-cn" and "iso-2022-kr".
3. If you decide to use my code in HtmlViewer, later you can add rows to the 
recoding function "function TBuffer.AsString: TBuffString;".

Original comment by SchwarzK...@yandex.ru on 30 Apr 2012 at 12:48

Attachments:

HtmlBuffer.rar

GoogleCodeExporter commented 9 years ago

When you work noticed two interesting things:
1. The method of "TBuffer.Convert" in the form WILL NOT WORK.
Function "function TBuffer.Convert" refers to "Buffer.AsString", and there is a 
comparison of "if FCodePage <> FInitalCodePage then". But using this method can 
not be specified explicitly FInitalCodePage. Therefore, recoding, in any case 
is not correct. Button "Buffer.Convert".
2. Button "Buffer.AsString"
If you use this code, for some strange reason, WHEN DIFFERENT ENCODINGS and 
WITH DIFFERENT TEXT appears different extra character at the end of the text. 
It does not always happen, at varying intervals occur. The order of detection: 
If at first you press the button once the symbol does not appear, you must 
close the application, and repeat. This error, in my opinion, does not depend 
on added my code, because it uses the original code.

Original comment by SchwarzK...@yandex.ru on 5 May 2012 at 2:56

Attachments:

GoogleCodeExporter commented 9 years ago

The problem described in Comment 17 appears from r277. The problem is NOT in 
the file "HtmlBuffer". I think the file is to blame "StyleUn", though perhaps 
complicit in this and other files to release.

Original comment by SchwarzK...@yandex.ru on 7 May 2012 at 7:08

GoogleCodeExporter commented 9 years ago

You made some decisions about my solutions to problems with encoding?
I lay out the fix?
Can then close the topic?

Original comment by SchwarzK...@yandex.ru on 1 Jul 2012 at 5:24

GoogleCodeExporter commented 9 years ago

Hi,

although I appreciate your contributions, I didn't have the time to look into 
it deeply enough to adopt it to the HtmlViewer. 

I must admit, I could not understand every sentence (or better: series of words 
terminated by a colon) your translator emitted :(

Image 11.png attracted my attention, as the '?' instead of korean or chinese 
symbols is the oldest open issue (issue 10) and I couldn't reproduce it. But 
these are independently shown "extra chars", aren't they?

The "extra chars" got its own issue 162 recently.

OrphanCat

Original comment by OrphanCat on 1 Jul 2012 at 6:11

GoogleCodeExporter commented 9 years ago

I think I found a solution to the last more than a symbol. Tomorrow will lay 
out what has changed in that time.
P.S. It is difficult to communicate through an interpreter. I try to express 
their thoughts easily.

Original comment by SchwarzK...@yandex.ru on 1 Jul 2012 at 6:23

GoogleCodeExporter commented 9 years ago

Over the past couple of months with a new version of the module with different 
encodings have not noticed any problems.
1. To use the method "TBuffer.AsString" introduces several constants.
2. The method of "TBuffer.AsString" now works with all available encodings.
3. Fixed minor bugs.
4. Fixed a problem with more than a symbol in the bottom of the page that 
appears only when using the "TBuffer.AsString".

Compounding this problem is not always and only when such use. I decided to 
read it, I thought that was correct. Maybe I'm wrong.

  procedure CharByChar;
  var
    I: Integer;
  begin
    I := 1;
    repeat
      Result[I] := NextChar;
      if Result[I] = #0 then
        break;
      Inc(I);
    until false;
===>    SetLength(Result, I - 1);
  end;

Described in Issue 162 did not help in removing excess characters.

5. Mektod "TBuffer.Convert" does not work. Needed corrections to the underlying 
code.

While my thoughts on this method to improve the conversion run out. :)

Original comment by SchwarzK...@yandex.ru on 2 Jul 2012 at 9:06

Attachments:

HtmlBuffer.rar

GoogleCodeExporter commented 9 years ago

Thanks for this immense work!

I will consider adding it to HtmlViewer 11.4.

I'd like to transform the huge "case FCodePage" in TBuffer.NextChar into a 
bundle of classes derived from a TBuffAbstractDecoder implementing a virtual 
method GetNext(Buffer: TBuffer): TBuffChar; 

A member TBuffer.FDecoder: TBuffAbstractDecoder; can be initialized once in 
SetCodePage and NextChar() becomes clear and short.

Please let me know, if you want to do this change.

BTW: the official HtmlBuffer.pas has changed in the meanwhile.

OrphanCat

Original comment by OrphanCat on 2 Jul 2012 at 10:01

Added labels: Milestone-Release11.4

GoogleCodeExporter commented 9 years ago

You can certainly try that might work. The truth of my bad encoder and need a 
good understanding of the original method code. What can - help you.

In my version of "HtmlBuffer.pas" made all the changes from the initial 
code, but with corrections to my data.

Original comment by SchwarzK...@yandex.ru on 3 Jul 2012 at 6:20

GoogleCodeExporter commented 9 years ago

Hi,

when I try to compile your latest HtmlBuffer.pas (file date: July, 1st 2012) 
the compiler (and I) cannot find methods Win1250DecodeChar .. 
Win1258DecodeChar. More methods are missing or not exported by 
CodeChangerDecode.pas.

Could you please post a complete set of source files?

Thank you
OrphanCat

Original comment by OrphanCat on 26 Sep 2012 at 10:50

GoogleCodeExporter commented 9 years ago

Well, I will make changes according to the latest developments in thtmlviewer 
and lay out a complete set.

Original comment by SchwarzK...@yandex.ru on 26 Sep 2012 at 1:34

GoogleCodeExporter commented 9 years ago

Thanks a lot.

It would be most helpful now, if you just add the missing methods to 
CodeChangerDecode.pas. 

Currently I'm changing HtmlBuffer.pas once again. I'm adding the above 
mentioned TBuffer.FDecoder. 

Later you can add the decoder class implementations. "Later" means "about a 
week from now". Then you will find some examples in the new unit 
BufferSubs.pas. 

Thanks again
OrphanCat

Original comment by OrphanCat on 26 Sep 2012 at 1:46

GoogleCodeExporter commented 9 years ago

Promise.
Changes:
1. Updated as of r317.
2. I do not use Delphi above 2007. After a trial on compiling XE3 changes: 
StrAlloc ==> AnsiStrAlloc.

Original comment by SchwarzK...@yandex.ru on 27 Sep 2012 at 2:27

Attachments:

Coder.rar

GoogleCodeExporter commented 9 years ago

Help solve the problem related to HTMLViewer because you know better Unicode.
Can not recode string containing Cyrillic. Wanted to get the correct display of 
Asian characters.
If you use the
function EUC_CNDecodeString(const S: String): WideString;
button "String" is Cyrillic (on the right) is recoded correctly, but the 
Chinese re-encoded string is not correct.
If you use the
function EUC_CNDecodeString2(const S: WideString): WideString;
button "WideString" - the opposite is true.
I can not determine where to use "AnsiString", and where "WideString". :(
Can you tell how to ...

I hope to describe the problem is available.

Original comment by SchwarzK...@yandex.ru on 27 Sep 2012 at 2:45

Attachments:

Test.rar

GoogleCodeExporter commented 9 years ago

Hi,

unfortunatelly I cannot see different results. Both buttons convert the left 
text to a chinese text and the right text to 'a...z' because '§?' is no legal 
EUC_CN character.

BTW: the methods in CodeChangerDecode.pas are too cumbersome. 
- they do not return the number of consumed characters. The caller must use 
additional code to determine that number.
- they allocate from heap although a local array variable with a fixed length 
would be simpler.
- they use if-else-if chains instead of case constructs.
- they use PAnsiChar and a lot of Ord()s. Using PByte and removing Ord() makes 
the code easier to read.
- the methods they call often repeat the same checks the caller already has 
performed to find out, which method to call.

Instead of:     function xxxDecodeChar(const P: PAnsiChar): WideChar;
they should be: function xxxDecodeChar(var P: PAnsiChar): WideChar;
or even better: function xxxDecodeChar(var P: PByte): WideChar;

The second version can proceed P to the next character of the source, if 
successfully consumed a character (but they should not proceed beyond the 
trailing #0).

The third version is better, because it does not imply any character code like 
PAnsiChar does.

FYI: Currently I'm writing some above mentioned decoder classes. I'm picking 
the algorithms from CodeChangerDecode.pas, copy them to my new unit 
BufferSubs.pas and optimize them.

OrphanCat

Original comment by OrphanCat on 1 Oct 2012 at 5:52

GoogleCodeExporter commented 9 years ago

I'm trying to make the converter lines based on your HtmlBuffer. Ironically, 
when you convert the individual characters are all correct.
Where can I take BufferSubs? Neither here nor at the second site, I do not see 
it.
As I said before, feel free to dispose of my modules and how they want to 
optimize. My version may not be the best. :)

Original comment by SchwarzK...@yandex.ru on 1 Oct 2012 at 6:10

GoogleCodeExporter commented 9 years ago

In fact, the difference when you have a set of keys.
Understand further.

Original comment by SchwarzK...@yandex.ru on 1 Oct 2012 at 6:26

Attachments:

1.png

GoogleCodeExporter commented 9 years ago

Hi,

this is not, what I see when I am running the program.

In file unit1.dfm I see that there are unicode characters (> #255) in control 
Edit11. 
If you want to convert to unicode/widestring you must use AnsiString to apply 
the multi byte character string.

So, you should convert Edit1 and Edit11 to simple VCL TEdit controls.

As to the cyrillic text in Edit11: What do you expect your decoder to do? 
Convert from which code to WideString/Unicode? EUC_CN does not contain cyrillic 
letters, thus any conversion of your current Edit22.Text to cyrillic unicode 
letters is not done by EUC_CNDecodeChar(). Obviously it happens when 
Edit11.Text is assigned to parameter S of method EUC_CNDecodeString(). This 
conversion uses the character set of the operating system. This way your and my 
results can differ as yours default is russian and mine is ansi.

EUC_CNDecodeString() "converts" the widechars in S via AnsiChar(s[i+1]), which 
converts the first cyrillic character §С (= #167#1057) to §! (= #167#33). It 
simply removes the high byte (#1057 = #1024 + #33 = #$0400 + #$0021). As #33 is 
invalid as second byte EUC_CNDecodeChar() returns #$FFFD and the illegal letter 
is skipped.

OrphanCat

Original comment by OrphanCat on 1 Oct 2012 at 11:25

GoogleCodeExporter commented 9 years ago

In EUC-CN is included GB2312, GB2312 and accurately contains Cyrillic. The 
issue was on the other. I did not realize then that the line for conversion was 
originally created in Unicode. To its right to re-encode the Chinese that the 
line was just as Unicode. Thus, for the conversion to Cyrillic I used ANSI 
string.
Therefore the conversion algorithm is working properly, I am afraid that will 
have to make major changes in my modules. :) But to convert Unicode strings 
still need to input parameter was WideString, ie
function EUC_CNDecodeString(const S: WideString): WideString;
There remains only the problem of how to determine the part of the string that 
contains Wide ANSI or if there is no clear evidence defining Unicode...

Thank you.

Waiting for a new HTMLViewer. :)

Original comment by SchwarzK...@yandex.ru on 2 Oct 2012 at 8:33

GoogleCodeExporter commented 9 years ago

Hi,

I've committed the latest changes including new units BuffConv and 
BuffConvArrays to GitHub. 

OrphanCat

Original comment by OrphanCat on 2 Oct 2012 at 4:38

GoogleCodeExporter commented 9 years ago

I have something on this github.com not see. Downloading files from 09/25/2012.

Original comment by SchwarzK...@yandex.ru on 2 Oct 2012 at 4:54

GoogleCodeExporter commented 9 years ago

Did you look into branch HtmlViewer11?

Original comment by OrphanCat on 3 Oct 2012 at 12:00

GoogleCodeExporter commented 9 years ago

Now found a look. Cleverly organized site. :)

Original comment by SchwarzK...@yandex.ru on 3 Oct 2012 at 8:25

GoogleCodeExporter commented 9 years ago

As they say in Russia: there are two news, one good, the other bad. With what 
to begin? :)
I'll start with the good.
You managed to significantly reduce the amount of code conversion. I basically 
used the method of applying a conversion matrix, character by character 
encoding to avoid using functions MultiByteToWideChar. But since you used it, 
then so be it.
Now the bad.
Page (I have previously laid out for a test, add now) encoded with errors on 
Win7:
922, 936, 949, big5, euc-jp, gb18030, gbk, iso-2022-cn, iso-2022-jp-1, 
iso-2022-kr, iso-8859-10, iso-8859-14, iso-8859-16, koi8-t, utf-7, folder ADD 
gb2312, gb2312, ks_c_5601-1987
not further encoded on WinXP without Asian fonts in the system: 858, 
iso-8859-3, iso-8859-6, iso-8859-8
and I think a few more test on a different machine.

Other Languages for conversion must not be added. Some encodings are not 
used in HTML, correct transcoding others I can not control it.

I write here, not on GitHub, where I somehow did not used to writing.

So as I ask errors described in Issues150, Issues186 as keep 4-3 copy of the 
code and do testing somehow difficult.

Original comment by SchwarzK...@yandex.ru on 3 Oct 2012 at 3:16

Attachments:

Page.rar

GoogleCodeExporter commented 9 years ago

Thanks for testing.

The erroneous pages are not yet implemented. TBuffConvSingleByte is just a 
default for not (yet) explicitly implemented pages.

I hoped you could add the missing converters. I hoped you would register at 
github and fork your own repository from mine and push your additions to your 
github repository.

I'm sorry, but I cannot understand the last sentence about Issues 150 and 186. 
Which copies of code do you keep?

OrphanCat

Original comment by OrphanCat on 3 Oct 2012 at 7:17

GoogleCodeExporter commented 9 years ago

I was referring to the fact that I have kept for code that does not have the 
errors described in Issues 150 and 186. I think it's version 11.2 or 11.3. Just 
try to correct the error described in Issues 150 and 186.
Error pages may not be realized, but strangely different. For some reason some 
pages open on Win7 and do not open on WinXP. In my opinion not working 
correctly MultiByteToWideChar function from which I got rid of...

To be honest, at the moment I have very little time to program and hard to say 
when it will... Just try to be aware of things of interest to me and something 
to do as possible.

Original comment by SchwarzK...@yandex.ru on 3 Oct 2012 at 7:37

GoogleCodeExporter commented 9 years ago

Ok, I will do it.

Original comment by OrphanCat on 3 Oct 2012 at 7:42

GoogleCodeExporter commented 9 years ago

Hi,

units BuffConv and BuffConvArrays are complete now, I think.
All code pages seem to be okay now, incl. UTF-7.
I would be glad, I you could test it once again.

Thanks in advance.
OrphanCat

Original comment by OrphanCat on 6 Oct 2012 at 4:45

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Well, I test. The last time I thought that all the encoding have been added, so 
I wrote about errors. :)

Original comment by SchwarzK...@yandex.ru on 6 Oct 2012 at 5:29

GoogleCodeExporter commented 9 years ago

Tested.
When you open the test pages are no known issues.
But somewhere in the file HtmlBuffer mistake or something left out.
I used the direct encoding using TBuffer. The previous version HtmlBuffer 
Buffer.Convert method did not work, and the method worked remarkably 
Buffer.AsString (file Project1-Old.exe). The new version of the file HtmlBuffer 
recoding does not occur. When pressing "Buffer.Convert" and "Buffer.AsString" 
downloading a document is only the second button, and so the same does not 
happen (still file Project1-New-HTML.exe). In the file "Project1-New.exe" no 
translation occurs even if you specify the encoding.
The problem described in Comment 17 until unable to reproduce, so as not 
working recoding.

Original comment by SchwarzK...@yandex.ru on 9 Oct 2012 at 4:37

Attachments:

00.rar

GoogleCodeExporter commented 9 years ago

Advanced.
Encoding "KOI-8T" is code page 20866, so the same as "KOI-8R" but some of the 
characters are different from "KOI-8R". I added it at number -5. I do not know 
whether it is necessary to add a new module...

Original comment by SchwarzK...@yandex.ru on 11 Oct 2012 at 1:56

GoogleCodeExporter commented 9 years ago

If in KOI-8R (CodePage "-5") only a few characeters differ from CodePage 20866 
(KOI-8T), we can add a decoder in TBuffBaseConverter for CodePage -5, that uses 
the same decoder as CodePage 20866, except for the differing characters. 

Currently CodePage 20866 (KOI-8T) is converted by MultiByteToWideChar().
Can you post a "case" statement for the differing characters?

Thanks
OrphanCat

BTW: I committed a fix for the Convert()/AsString error. As you might have 
noticed, I removed the TBuffer.Create(Text: AnsiString, ...) constructor. I did 
it to avoid misunderstandings, because passing an "Ansi"-String implies an ANSI 
code page (which one depends on charset) like a UnicodeString implies CodePage 
1200.

And I added a constructor TBuffer.Create(Text: PByte, ByteCount: Integer, ...) 
which is a more flexible one without implications. Please change the code in 
your test application accordingly: 

TBuffer.Convert(@RichEdit1.Text[1], Length(RichEdit1.Text), ...
TBuffer.Create(@RichEdit1.Text[1], Length(RichEdit1.Text), ...

Original comment by OrphanCat on 11 Oct 2012 at 2:30

GoogleCodeExporter commented 9 years ago

Visually, the difference can be seen in 
http://ru.wikipedia.org/wiki/%CA%CE%C8-8. Obvious differences - a symbol of 
2116 (B9) KOI-8T, and something else...

Original comment by SchwarzK...@yandex.ru on 11 Oct 2012 at 2:50

GoogleCodeExporter commented 9 years ago

I'm after a while your code to adapt the new changes in the modules and test 
re-encoding "AsString" from one encoding to another encoding.

Original comment by SchwarzK...@yandex.ru on 11 Oct 2012 at 4:13

GoogleCodeExporter commented 9 years ago

1. Please return function
function CharSetToCodePage(ACharSet: String): Integer;
It is not used in the main code, but it is extremely useful for conversion.
2. All the same function "Convert(Text: TBuffString; CodePage: TBuffCodePage)" 
does not work correctly if the input string WideString, as well as the method 
of "Create(Text: TBuffString; Name: TBuffString ='')". It turns out that to 
convert the input string was-would necessarily AnsiString and use the pointer. 
Put a usage example, if something is not clear - ask.
Or something I can not understand. :)

Original comment by SchwarzK...@yandex.ru on 12 Oct 2012 at 8:43

Attachments:

0000.rar

Patiencer / thtmlviewer

Encoding #128