apache / libcloud

Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API.
https://libcloud.apache.org
Apache License 2.0
2.04k stars 925 forks source link

Tilde ('~') character is incorrectly URL-encoded with Python 2 #1452

Open mmilitzer opened 4 years ago

mmilitzer commented 4 years ago

I'm referring to the older, unresolved issue here (it seems I cannot create a new issue nor comment on the older one over at issues.apache.org though I have a long-time user there...):

https://issues.apache.org/jira/browse/LIBCLOUD-979

I can confirm that the reported issue happens also for Amazon S3, not only for CEPH as originally reported. And the problem is indeed that libcloud URL-encodes the tilde '~' sign to '%7E', which it shouldn't according to RFC 3986.

The problem seems specific to Python 2 because in Python 3 urllib.quote() has been already updated to not URL-encode the tilde '~' sign anymore:

https://bugs.python.org/issue16285

Because of this, I think the correct fix (other than the patch proposed in LIBCLOUD-979) would be adding the tilde '~' character to the safe characters in the urlquote() utility function in utils/py3.py to ensure consistent behavior between Python 2 and Python 3.

--- libcloud/utils/py3.py       2020-04-30 12:58:55.804591828 +0000
+++ libcloud_fixed/utils/py3.py 2020-04-30 12:59:25.975368813 +0000
@@ -211,7 +211,7 @@

     tostring = ET.tostring

-    def urlquote(s, safe='/'):
+    def urlquote(s, safe='/~'):
         if isinstance(s, _real_unicode):
             # Pretend to be py3 by encoding the URI automatically.
             s = s.encode('utf8')
Kami commented 4 years ago

Thanks for reporting this and thanks for contribution.

The change looks reasonable for me, but I would feel safer if there were some tests.

In addition to that, I'm a bit worried that some providers are relying on clients incorrectly encoding ~...

Kami commented 4 years ago

I just tested urlqoute from Python 3.6 and it seems to handle it in the same manner as Python 2 - aka it encodes it as %7E.

Python 3.6.9 (default, Sep  9 2019, 13:49:23) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import quote as urlquote
>>> urlquote('~')
'%7E'

Having said that, to be on the safe side - I think we should only add that change to S3 driver to begin with.

Who knows how many (if any) provider APIs rely on ~ being url encoded as %7E so changing it could introduce regressions in many other drivers.

Kami commented 4 years ago

@mmilitzer #1457 should fix it.

Can you please confirm (I tested it myself, but more eyes and testing never hurt)?


For now I went with the S3 driver only approach.

In the future, when we feel more comfortable about that change, we can try modifying that behavior (aka when more people use Libcloud under Python 3.7 without reporting any urlquote related issues) inside our urlquote wrapper.

mmilitzer commented 4 years ago

@Kami: I can confirm that your patch fixes the issue for S3.

My concern would be just that there'll still be inconsistent handling of '~' character now depending on Python version used though I understand your concerns. I'd just not rely on potential issues with other backends and the changed tilde handling in Python 3.7+ being quickly reported. After all, the very issue here has apparently been present for years and gone unnoticed. So ideally there should be a test for "tilde in the name" for all backends to determine what the proper handling is that the other provider APIs expect...

Kami commented 4 years ago

@mmilitzer Thanks for confirming.

For now, I'll merge #1457 and then in the future look into making it consistent across Python versions, but that will likely be included in major version bump (just in case).

stale[bot] commented 4 years ago

Thanks for contributing to this issue. As it has been 90 days since the last activity, we are automatically marking is as stale. If this issue is not relevant or applicable anymore (problem has been fixed in a new version or similar), please close the issue or let us know so we can close it. On the contrary, if the issue is still relevant, there is nothing you need to do, but if you have any additional details or context which would help us when working on this issue, please include it as a comment to this issue.