beznogno / pyicqt

Automatically exported from code.google.com/p/pyicqt
GNU General Public License v2.0
0 stars 0 forks source link

Broken enconding in offline messages #148

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I use
http://groups.google.com/group/py-transports/web/pyicqt-0.8.1-git-patched.tar.gz

vCards are fine.
nicknames are fine.
but when someone sends me message when I'm offline and then I go online - I
receive a lot of japanese hieroglyphs instead of ANY chars (both russan &
english & digits).

screenshot attached.

Original issue reported on code.google.com by zed.0xff on 21 Dec 2008 at 1:37

Attachments:

GoogleCodeExporter commented 9 years ago
Your contact use QIP?

Original comment by r000ns...@gmail.com on 21 Dec 2008 at 1:47

GoogleCodeExporter commented 9 years ago
no, he use licq.
I just tried jimm & kopete - got the same results.

update: 
when I message contains eng and/or rus letters - I get hieroglyphs.
but when msg containts only digits - I get these digits ok :)

Original comment by zed.0xff on 21 Dec 2008 at 2:08

GoogleCodeExporter commented 9 years ago
Confirm on PSI client using Linux. Russian messages sent to me when I offline
receives in lot of japanese hieroglyphs. Senders are using Kopete and QIP.

Original comment by Mur...@gmail.com on 22 Dec 2008 at 6:34

GoogleCodeExporter commented 9 years ago
I use PSI too :)

Original comment by zed.0xff on 22 Dec 2008 at 9:10

GoogleCodeExporter commented 9 years ago
Confirm on Pidgin (Linux) and Pandion (Windows)

Original comment by mail.spy...@gmail.com on 22 Dec 2008 at 2:01

GoogleCodeExporter commented 9 years ago

Original comment by r000ns...@gmail.com on 22 Dec 2008 at 3:08

GoogleCodeExporter commented 9 years ago
Add <detectunicode/> option in your pyicqt.conf.xml and test it, please:
http://groups.google.com/group/py-transports/web/pyicqt-0.8.1-git.tar.gz

Original comment by r000ns...@gmail.com on 22 Dec 2008 at 3:31

GoogleCodeExporter commented 9 years ago
I have the version mentioned in comment #7, I have <detectunicode/> set, and I 
see
hieroglyphs instead of Cyrillic offline messages from licq.

This hieroglyph string turns into valid UTF-8 Cyrillic if you run it through 
"iconv
-t utf16". Note that it's *not* "-f" but "-t"!

The case described in some more detail in the group:
http://groups.google.com/group/py-transports/browse_thread/thread/9f933ac692ce99
59/14d7d2b3ee1d42ce#14d7d2b3ee1d42ce

Original comment by egcros...@gmail.com on 23 Dec 2008 at 3:08

GoogleCodeExporter commented 9 years ago
egcrosser: I'm tried reproduce it - cp1251 in Licq as default, utf-8 for user, 
but
offline messages were successfully recognized for me. You can put text of 
message here?

Original comment by r000ns...@gmail.com on 23 Dec 2008 at 4:30

GoogleCodeExporter commented 9 years ago
In the text attached, the two messages at 08:31:22 where sent while I was 
offline.
Original:
=====
(08:26:30 PM) crosser@average.org/Work: щас уйду в offline, 
напиши что-нибудь
(08:31:22 PM) ttanushka: 笠ﯺ﷼਍
(08:31:22 PM) ttanushka: 
냐뇐닐돐듐뗐뛐럐룐말뫐믐볐뷐뻐뿐胑臑苑菑蓑藑蛑蟑裑觑��
�译賑跑
軑近਍
(08:32:29 PM) ttanushka: ну что?...что-нибудь читаемо?
=====
After conversion "iconv -t ucs-2le":
=====
(08:26:30 PM) crosser@average.org/Work:
I0A C94C 2 offline, =0?8H8 GB>-=81C4L
(08:31:22 PM) ttanushka: 
��������������������������

(08:31:22 PM) ttanushka: 
абвгдежзийклмнопрстуфхцчшщъыьэюя

(08:32:29 PM) ttanushka: =C GB>?...GB>-=81C4L G8B05<>?
=====
I will be happy to do more testing, just tell me what exactly to do.

Eugene

Original comment by egcros...@gmail.com on 24 Dec 2008 at 8:34

Attachments:

GoogleCodeExporter commented 9 years ago
Even with message 
'абвгдежзийклмнопрстуфхцчшщъыьэюя' it's works 
for me. More
information necessary.

Please, run transport in debug mode (add -D parameter to command line). Then
reproduce this error (receive this message again). And show lines with 'Received
Offline' and 'Converted message' text from log

Original comment by r000ns...@gmail.com on 24 Dec 2008 at 10:45

GoogleCodeExporter commented 9 years ago
Indeed, I could not reproduce it now. I could read both offline messages that 
where
sent from licq one with "windows-1251" and one with "utf-8" settings. The 
difference
was that in this case, I stopped the PyICQ process rather than disconnecting the
client from the transport like I did last time.

Could it have anything to do with the licq changing encoding on the fly, and 
PyICQ
using the information about the peer's capabilities from a previous request 
that was
done before the change?

At the same time, I understood that the ecoding in online communication is not 
right!
In particular, when the user of licq set encoding "windows-1251", I see her 
messages,
but she does not see mine. When she sets encoding "utf8", she sees my messages 
but I
don't see hers. What I get in the latter case is a string that becomes readable 
when
passed through "iconv -t windows-1251".

I am attaching two logs, the one produced by PyICQ with -D, the other - regular
Pidgin log. I tried to remove all data that is not related to the particular
conversation for privacy reasons. The last two messages dated 15:58:28 where 
sent
while the transport was not running with different encodings, all the rest are
"online". The second from the top message, dated 15:43:22, is an example of 
wrong
encoding in online conversation.

I gather that you may understand Russian, if so, then the log should be 
self-explanatory.

I will post if/when I find more information about the problem, and/or you are 
welcome
to tell me what other experiments I can do.

Thanks,

Eugene

Original comment by egcros...@gmail.com on 24 Dec 2008 at 1:59

Attachments:

GoogleCodeExporter commented 9 years ago
Another related note: if I replace "<encoding>windows-1251</encoding>" with
"<encoding>utf-8</encoding>", I can normally communicate with licq users (that 
set
"utf-8" encoding for me) and with most other peers, but not with some users 
(probably
older versions of official ICQ).

Original comment by egcros...@gmail.com on 24 Dec 2008 at 4:00

GoogleCodeExporter commented 9 years ago
With online messages a bit simpler. Test updated version, it should works for 
utf-8
encoding in Licq:
http://groups.google.com/group/py-transports/web/pyicqt-0.8.1-git.tar.gz

Option for unicode detection more sensitive now (not only enable/disable), 
update
your config:
    <!-- Try detect Unicode:
        0 - never
        1 - in offline messages
        2 - and in nicknames
        Attention: this solution can be slowly on high-load servers
    -->
    <detectunicode>1</detectunicode>

Original comment by r000ns...@gmail.com on 24 Dec 2008 at 4:09

GoogleCodeExporter commented 9 years ago
With the version from comment 14, and "<encoding>windows-1251</encoding" in 
PyICQ
config, and licq configured for utf8, online messages in both directions pass
correctly. Offline messages from licq are all right. But offline messages from 
PyICQ
are displayed wrong in licq: 'абв' is displayed as '012'.

(I installed licq 1.3.5 on the local machine for this experiment)

Original comment by egcros...@gmail.com on 24 Dec 2008 at 5:27

GoogleCodeExporter commented 9 years ago
It's because transport sends offline messages in utf-16 (impossible send utf-8 
to
offline as far I know). This is more right way than sending in any national 
encoding.
But Licq as you might guess always sends and receives messages only in one 
encoding,
specified for every user.

When PyICQt user going to offline, Licq user should change his encoding from 
utf-8 to
utf-16, and when he/she returning back to online then change encoding again. 
This is
bad solution, but currently I have no better solving of problem.

Original comment by r000ns...@gmail.com on 24 Dec 2008 at 5:50

GoogleCodeExporter commented 9 years ago
Doesn't it make sense to send outgoing (online and offline) messages in the 
encoding
that is specified as "<encoding>" in the config file *if* there is no reasonable
indication of the peer's unicode capabilities? I am not sure that it would be 
the
right thing, but just a thought...

In particular, when both pyicq and licq are explicitly configured to use
"windows-1251", sending messages from pyicqt to licq in utf-8 does not seem 
logical?
And this is what happens.

    0x00a0:  0000 0000 0001 0000 0001 0022 0061 6263  ...........".abc
    0x00b0:  6465 6667 6869 6a20 d0b0 d0b1 d0b2 d0b3  defghij.........
    0x00c0:  d0b4 d0b5 d0b6 d0b7 d0b8 d0ba d0bb 0000  ................

(the message was "abcdefghij абвгдежзикл")

Original comment by egcros...@gmail.com on 24 Dec 2008 at 9:05

GoogleCodeExporter commented 9 years ago
Hm. May be current version can help (in windows-1251 configuration)?
http://groups.google.com/group/py-transports/web/pyicqt-0.8.1-git.tar.gz

Original comment by r000ns...@gmail.com on 25 Dec 2008 at 10:12

GoogleCodeExporter commented 9 years ago
(pulled source from git instead, commit 
95b3c48f00df737cff5b8476657d9b6f4582d4fa)

This time, with licq configured to use cp1251, and <encoding> windows-1251 in
pyicqt.conf, both online and offline messages in both directions are readable. 
Yay!

I think that licq complained once when it received some sort of status message 
(not
real conversation), about being unable to convert it to or from ucs2-2le or 
something
like that but I did not record any details.

I will see how it works with other clients and report any problems I find.

Thanks!

Eugene

Original comment by egcros...@gmail.com on 25 Dec 2008 at 11:07

GoogleCodeExporter commented 9 years ago
OK, so this is how things look with the latest version:

- online conversations are fine with all my peers running different clients

- offline messages that I receive where readable every time I checked

- offline messages that I send are almost never readable by the peers. They are
readable by licq users, and on *some* occasions by ICQ6 users. To other ICQ6 
users,
and to Miranda, QIP and Pidgin users that I checked with they either look 
garbage or
are empty.

If I understand it right, it seems that *offline* messages are better to send in
utf16(?) as you did before the last change; that would leave licq in the cold 
but
ensure compatibility with the majority.

Eugene

Original comment by egcros...@gmail.com on 25 Dec 2008 at 2:39

GoogleCodeExporter commented 9 years ago
Encoding for messages chooses by other way now. Instead of rule 'Send always in
unicode' works 2 rules:
1. Send by default in custom encoding
2. Send in Unicode if contact supports it.
But for checking this support transport after run should see contact as least 
one time.
1. Transport starting
2. Contact become online
3. Transport saving info about unicode support
4. Contact become offline
5. You sending message
6. Transport sending message in unicode

Original comment by r000ns...@gmail.com on 25 Dec 2008 at 3:55

GoogleCodeExporter commented 9 years ago
re. comment #21: I'd be happy to test it against my peers, just please drop a 
note
when you commit the changes into the repository.

Upfront comments:
1. Isn't it an overkill? Maybe it's reasonable to sacrifice offline messages to 
licq
for the sake of simplicity? As long as all other combinations work...
2. Don't forget the the peer may change their software at any time: it's 
probably a
good idea to refresh our notion of their capabilities every time we see them 
online.

Thanks,

Eugene

Original comment by egcros...@gmail.com on 26 Dec 2008 at 9:32

GoogleCodeExporter commented 9 years ago
:) No, in reality it's good way. And ICQ clients do something like this

Original comment by r000ns...@gmail.com on 26 Dec 2008 at 2:46

GoogleCodeExporter commented 9 years ago

Original comment by r000ns...@gmail.com on 31 Dec 2008 at 8:30

GoogleCodeExporter commented 9 years ago
Fixed?
At the very least, it does not work for offline messages to Adium and to ICQ6 
(they
cannot read my messages). Works for QIP. More clients to check...

(yes, I did exchange online messages with them, before testing offline)

Eugene

Original comment by egcros...@gmail.com on 31 Dec 2008 at 9:05

GoogleCodeExporter commented 9 years ago
:) Ok, separate option for choosing of encoding added

Original comment by r000ns...@gmail.com on 6 Jan 2009 at 5:35

GoogleCodeExporter commented 9 years ago
In what version this is fixed? I try this version today:
http://groups.google.com/group/py-transports/web/pyicqt-0.8.1-git.tar.gz
but the problem is still here.
I receive the wrong russian letters from offline:
[09:22:17] <lawrentiy> 
†ⴠ⃤⃢¥N⸮⃨⃱⃰ﬠ��
�
[09:22:18] <lawrentiy> 
Ⱐ⃴‭⃯Ⱐ@⃱⋒��
�⃰易⃷⃲⃱⃭㼠ﰠ�
��쇄
⃨⃱ﬠ⃳⃲@

r000nster, can you send me at murznn[at]gmail.com the version of pyict with this
issue fixed for tesing?

Original comment by Mur...@gmail.com on 12 Jan 2009 at 8:58

GoogleCodeExporter commented 9 years ago
It's already latest version. Just add this line to your config:
<detectunicode>1</detectunicode>

Original comment by r000ns...@gmail.com on 12 Jan 2009 at 2:50

GoogleCodeExporter commented 9 years ago
As of today (git commit 1c2b8a0a3846ed296d0f5ef193294d15f0de8e38), offline 
messages
from pyicqt to many others clients are unreadable (checked with ICQ6 and QIP). 
I have
"encoding for outgoing offline messages" set to "auto detect". I'd say, now 
things
are worse than they where before the Dec 25 change.

Should this ticket be reopened, or a new one opened?

Eugene

Original comment by egcros...@gmail.com on 16 Jan 2009 at 10:49

GoogleCodeExporter commented 9 years ago
I have using version 0.8.1.1 of pyicqt with this patch:
http://pyicqt.googlecode.com/issues/attachment?aid=8022030648936962831&name=pyic
q-t-0.8-seqnum.patch

And today I see the bad message from offline:
[11:40:51] <Nickname> 
‡਍ﳫ⃲業慬楶獴湡⹮畲振湯牴汯ഠ爊潯⽴敲��
�楳湯
But he sent:
[11:44:51] <Nickname> пароль от xxxx root/xxxxx

I have a <detectunicode>1</detectunicode> in config.
All config is:
<pyicqt>
        <jid>icq.xxx.ru</jid>
        <mainServer>127.0.0.1</mainServer>
        <mainServerJID>xxx.ru</mainServerJID>
        <website>http://xxx.ru/</website>
        <port>5347</port>
        <secret>xxx</secret>
        <lang>ru</lang>
        <encoding>cp1251</encoding>
        <icqServer>login.oscar.aol.com</icqServer>
        <icqPort>5190</icqPort>
        <admins>
        <jid>murz@xxx.ru</jid>
        </admins>
        <xdbDriver>xmlfiles</xdbDriver>
        <detectunicode>1</detectunicode>
        <usemd5auth/>
</pyicqt>
Maybe I need set anything else in config?

Original comment by Mur...@gmail.com on 28 Jan 2009 at 8:51