fritzy / SleekXMPP

Python 2.6+/3.1+ XMPP Library
http://groups.google.com/group/sleekxmpp-discussion
Other
1.1k stars 299 forks source link

get_roster chokes on some XIDs #245

Open dolda2000 opened 11 years ago

dolda2000 commented 11 years ago

I'm in the process of trying to rewrite an XMPP client, previously written with another library, with SleekXMPP. I have noticed, however, that the previous client has registered some XIDs in its roster on the server, which SleekXMPP considers invalid when the client fetches the roster from the server. Therefore, get_roster cannot complete.

Some instances of such "invalid" XIDs include mistyped XIDs (with the domain name ending in ".co,") or XIDs whose server name includes a port number (like "foo@bar.org:5222"). In these cases SleekXMPP seems to consider "," and ":" to be invalid as part of the domain name of an XID.

It's somewhat troubling to me that SleekXMPP cannot even process such a roster, when the server has no problems with it. It means that the client cannot even remove such invalid entries.

Further, I'd also like to interject that none of the characters specified in jid.py as invalid in domain names are, in fact, invalid. Domain names can actually include any octet. :)

legastero commented 11 years ago

Thanks for the report @dolda2000! I've changed get_roster() so that it doesn't do the internal roster update directly, instead running it through the roster_update event so that get_roster() returns immediately. This way, you would still be able to access the raw XML data to correct your roster if needed.

It's somewhat troubling to me that SleekXMPP cannot even process such a roster, when the server has no problems with it. It means that the client cannot even remove such invalid entries.

I find it more troubling that the server did not reject these JIDs from the start. Which server are you using?

Further, I'd also like to interject that none of the characters specified in jid.py as invalid in domain names are, in fact, not invalid. Domain names can actually include any octet. :)

That is true for generic domain names, yes. But, for a JID (RFC 6122), each section of the domain part is limited to be ASCII alpha/numeric/hyphens. Specifically, every character in the domain part (if it isn't an IP literal, and after converting from punycode) must be able to go through the ToASCII() operation specified in IDNA2003 with the UseSTD3ASCIIRules flag set, which enforces the rules:

 (a) Verify the absence of non-LDH ASCII code points; that is, the
     absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.

 (b) Verify the absence of leading and trailing hyphen-minus; that
     is, the absence of U+002D at the beginning and end of the
     sequence.
dolda2000 commented 11 years ago

Which server are you using?

Plain old ejabbed. I also used the old python-xmpp library for communicating with it.

But, for a JID (RFC 6122), each section of the domain part is limited to be ASCII alpha/numeric/hyphens.

Sure. I might at least quote RFC1122, though:

At every layer of the protocols, there is a general rule whose application can lead to enormous benefits in robustness and interoperability: "Be liberal in what you accept, and conservative in what you send"

;)