S7NetPlus / s7netplus

S7.NET+ -- A .NET library to connect to Siemens Step7 devices
MIT License
1.27k stars 566 forks source link

The S7String type does not properly return Extended ASCII characters #343

Open arturonvz opened 3 years ago

arturonvz commented 3 years ago

The S7String type does not properly return Extended ASCII characters like the letter Ñ and accented vowels (áéíóú àèìòù). I recommend changing line 33 of the S7String.cs file to a byte [] to char [] conversion and then creating a new string with those characters. I solved it by changing that line like this:

Original: return Encoding.ASCII.GetString(bytes, 2, length);

Change:

char[] caracteres = new char[length - 2];
for (int i = 2; i < length; i++)
{
    caracteres[i - 2] = Convert.ToChar(bytes[i]);
}
return new string(caracteres);
arturonvz commented 3 years ago

I attach how the change was for me and how it works for me in the application that I am developing.

2020-10-21_14h47_58 2020-10-21_14h59_18

scamille commented 3 years ago

Do you know what exact encoding S7 PLCs use?

It is a bit hard to get good information on .NET Convert.ToChar, but as far as I can tell it uses the system text encoding.

To get a truly universal solution it might be best if we can get specify the correct encoding in all cases, independent of the system on which S7NetPlus is running.

My assumption would be that the PLC uses Latin1. Or can this be configured somewhere?

arturonvz commented 3 years ago

I believe that S7 PLCs handle the string only as an ASCII character array, they don't seem to use some kind of text encoding. I made a small correction in my code because it was not taking the complete string, with this it is working correctly:

char[] caracteres = new char[length];
for (int i = 2; i < length + 2; i++)
{
    caracteres[i - 2] = Convert.ToChar(bytes[i]);
}
return new string(caracteres);

image

scamille commented 3 years ago

hmm it is not like you can convert binary data to a string without some sort of encoding. ASCII is definitely a text encoding.

But you are talking about umlauts here, and ASCII does not contain those. I strongly supsect that the PLC uses something like https://en.wikipedia.org/wiki/Windows-1252 where the values from 128 to 255 map to the special Umlaut characters you are seeing.

SmackyPappelroy commented 3 years ago

I would really appreciate if we update the encoding in the library! I had to rewrite the code for string in order to send åäö

arturonvz commented 3 years ago

The problem is that Encoding.ASCII gets an encoding for the ASCII (7-bit) character set. This object doesn't include Extended ASCII (128-255) and puts question marks for any of these special characters.

In the S7-1200 (and of the other models I also believe), the strings are an arrangement of 8-bit chars, so it allows these special characters from 128 to 255. In Spanish and other languages these special characters are widely used, so it is necessary to include them.

I have been researching and I think we have to use the encoding according to the Code page 437 character set.

Convert.ToChar or (char) works for me, but I'm going to try it with Encoding.GetEncoding(437) based on the following answer from StackOverflow. https://stackoverflow.com/questions/17619279/extended-ascii-in-c-sharp

scamille commented 3 years ago

I am sure I can resolve the .NET fix for you without too much trouble, that is not the problem here. But before doing that, the following questions need to be fully and definitely answered, from someone with PLC knowledge/connections. That really has nothing to do with C#/.NET :)

arturonvz commented 3 years ago

I did not find any information about the enconding for Siemens S7 PLCs, but it is the default for the STRING data type. It can not be changed. I did a test forcing the ASCII and Extended ASCII values ​​into two variables. ASCII 0 to 31 and 127 are not printable. ASCII 32 to 126 matches the standard. ASCII-Ext 128 to 255 does not fully match Code page 437. Siemens is a German brand, what will be the most common encoding of German computers?

2020-10-22_11h16_43

scamille commented 3 years ago

Have you tried Latin1? https://en.wikipedia.org/wiki/ISO/IEC_8859-1 Encoding.GetEncoding("iso-8859-1")

https://support.industry.siemens.com/cs/document/17980308/how-do-you-have-specific-language-character-sets-(e-g-cyrillic)-displayed-in-s7-200-text-displays-?dti=0&lc=en-WW

lists quite a few encodings, not sure if that really applies here.

SmackyPappelroy commented 3 years ago

I have tried all the encoders in "Encoding.GetEncodings()". The one @scamille suggested (CodePage = 1250) is the most correct. It is correct up to character[158], after that my unit test fails.

scamille commented 3 years ago

Have you also tried doing your unit test with the Convert.ToChar(byte) method?

I really don't want to hinder a solution here, I just fear that without making the encoding used independent of your .NET system that solution will only work if you set your OS to German, and might break if you set it to something else.

SmackyPappelroy commented 3 years ago

I tried with converttochar and I also tried setting my OS to german! I am all out of ideas @scamille

scamille commented 3 years ago

To avoid all these questions the best solution might be to add a Encoding parameter to the string conversion function. Users can then pass in whatever matches their preference, whether that is ASCII, Latin1, Windows-1252 or whatever other SingleByte encoding they think matches what the PLC does.

The only problem is that you can't easily pass in this extra parameter when writing a list of data items. (WriteStruct or similar should be fine - the encoding can be added to the string attribute).

DG4ever commented 2 years ago

Sorry that I have to bring this up again. Is there any solution on this topic like an optional encoding parameter? As I understand no encoding fits 100%. So I just entered all Extended-ASCII Chars via TIA-Portal and checked all codes manually.

Would this not be a good solution?

private static char ByteToChar(byte value)
{
    var data = new Dictionary<byte, char>()
    {
        { 128, '€' },
        { 130, '‚' },
        { 131, 'ƒ' },
        { 132, '„' },
        { 133, '…' },
        { 134, '†' },
        { 135, '‡' },
        { 136, 'ˆ' },
        { 137, '‰' },
        { 138, 'Š' },
        { 139, '‹' },
        { 140, 'Œ' },
        { 142, 'Ž' },
        { 145, '‘' },
        { 146, '’' },
        { 147, '“' },
        { 148, '”' },
        { 149, '•' },
        { 150, '–' },
        { 151, '—' },
        { 152, '˜' },
        { 153, '™' },
        { 154, 'š' },
        { 155, '›' },
        { 156, 'œ' },
        { 158, 'ž' },
        { 159, 'Ÿ' },
        { 161, '¡' },
        { 162, '¢' },
        { 163, '£' },
        { 164, '¤' },
        { 165, '¥' },
        { 166, '¦' },
        { 167, '§' },
        { 168, '¨' },
        { 169, '©' },
        { 170, 'ª' },
        { 171, '«' },
        { 172, '¬' },
        { 173, '­' },
        { 174, '®' },
        { 175, '¯' },
        { 176, '°' },
        { 177, '±' },
        { 178, '²' },
        { 179, '³' },
        { 180, '´' },
        { 181, 'µ' },
        { 182, '¶' },
        { 183, '·' },
        { 184, '¸' },
        { 185, '¹' },
        { 186, 'º' },
        { 187, '»' },
        { 188, '¼' },
        { 189, '½' },
        { 190, '¾' },
        { 191, '¿' },
        { 192, 'À' },
        { 193, 'Á' },
        { 194, 'Â' },
        { 195, 'Ã' },
        { 196, 'Ä' },
        { 197, 'Å' },
        { 198, 'Æ' },
        { 199, 'Ç' },
        { 200, 'È' },
        { 201, 'É' },
        { 202, 'Ê' },
        { 203, 'Ë' },
        { 204, 'Ì' },
        { 205, 'Í' },
        { 206, 'Î' },
        { 207, 'Ï' },
        { 208, 'Ð' },
        { 209, 'Ñ' },
        { 210, 'Ò' },
        { 211, 'Ó' },
        { 212, 'Ô' },
        { 213, 'Õ' },
        { 214, 'Ö' },
        { 215, '×' },
        { 216, 'Ø' },
        { 217, 'Ù' },
        { 218, 'Ú' },
        { 219, 'Û' },
        { 220, 'Ü' },
        { 221, 'Ý' },
        { 222, 'Þ' },
        { 223, 'ß' },

    };
    if (data.TryGetValue(value, out char result))
        return result;
    else
        return (char)value;
}
Jason-Jelks commented 2 years ago

The Siemens (depending on model) encodes datatype 'string' as ASCII (datatype 'wstring' as UTF16), however according to the Siemens Documentation (on Strings) depends on the Windows OS Language Settings which programmed the PLC.

_Quoted from Siemens Page Linked Above: Please note that the special characters are coded using the code page currently set in Windows. This means that a string that contains special characters can be displayed differently on a different operating system with a different code page.

The dependency of the codepage on the created system makes an international use of the user program more difficult. Only the characters from the 7-bit ASCII coding are internationally valid._

SmackyPappelroy commented 2 years ago

Here is an extension method I wrote a while back! It works perfectly. Never had any problems:

public static string S7StringSwedish(this byte[] bytes) { { if (bytes.Length < 2) { throw new PlcException(ErrorCode.ReadData, "Malformed S7 String / too short"); }

            int size = bytes[0];
            int length = bytes[1];
            if (length > size)
            {
                throw new PlcException(ErrorCode.ReadData, "Malformed S7 String / length larger than capacity");
            }

            try
            {
                char[] chars = new char[length];
                for (int i = 2; i < length + 2; i++)
                {
                    chars[i - 2] = Convert.ToChar(bytes[i]);
                }
                return new string(chars);
            }
            catch (Exception e)
            {
                throw new PlcException(ErrorCode.ReadData,
                    $"Failed to parse {VarType.S7String} from data. Following fields were read: size: '{size}', actual length: '{length}', total number of bytes (including header): '{bytes.Length}'.",
                    e);
            }

        }
    }