EliAndrewC / sideboard

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Sideboard should run under either Python 2 or 3 #62

Closed EliAndrewC closed 10 years ago

EliAndrewC commented 10 years ago

I'd really like to use Sideboard for the MAGFest codebase this year, but that means that Sideboard needs to be compatible with Python 3. However, it definitely needs to support Python 2 for at least a few more years, so that probably means that we just need a single-source (using six) codebase that works in either. Since we definitely don't support any Python earlier than 2.7, this shouldn't be a problem.

If I can't get this done by the end of next weekend, I might need to punt on running the MAGFest codebase as a Sideboard plugin for another year, since I should really get started on that around the beginning of May. Fingers crossed!

robdennis commented 10 years ago

I've been absent on sideboard so far, but I will get half of much sleep this week if it means you can useSideboard this year.

On Sun, Apr 27, 2014 at 9:06 PM, Eli Courtwright notifications@github.com wrote:

I'd really like to use Sideboard for the MAGFest codebase this year, but that means that Sideboard needs to be compatible with Python 3. However, it definitely needs to support Python 2 for at least a few more years, so that probably means that we just need a single-source (using six) codebase that works in either. Since we definitely don't support any Python earlier than 2.7, this shouldn't be a problem.

If I can't get this done by the end of next weekend, I might need to punt on running the MAGFest codebase as a Sideboard plugin for another year, since I should really get started on that around the beginning of May. Fingers crossed!

Reply to this email directly or view it on GitHub: https://github.com/appliedsec/sideboard/issues/62

ftobia commented 10 years ago

Let me know if you want help with this, and how I can pitch in. I'm also very interested in you being able to use Sideboard for MAGFest this year.

EliAndrewC commented 10 years ago

Sounds great. I'm planning on heading home early-for-a-Monday this evening so that I can do some work on this. I'd like to target Python 3.4 since it's the latest.

Am I correct in recalling that distribute works on Python 3 but even the latest setuptools does not? Or is it the case that now that they've merged back into the same package, setuptools should work on Python 3?

If someone wants to tackle #29, that'd be cool, but since I don't need LDAP for the MAGFest code, I can literally just conditionally import it if I have to and have that be okay until #29 is done.

Tonight I'll see if I can at least get to the point where Sideboard imports (possibly with certain things commented out and completely broken), and then update this ticket with the problems I've run into.

EliAndrewC commented 10 years ago

Ok, so I've been working on this for awhile. Here are the things I've run into:

For now I'm making simple changes in place to rpctools, so if you want to know what's involved with doing it for real and making that pull request, here's a list of what I'm doing to rpctools in my local site-packages to get it to work (this was all I needed to do it get it to import, I haven't tried actually using it yet):

Anyway, I'm fairly close to getting something that halfway works. The two main things people could do to help would be to swap out python-ldap for something Python3 compatible (#29) or fix rpctools to be Python 3 compatible. I'll keep chugging along on the Sideboard parts. I'm making good progress though.

EliAndrewC commented 10 years ago

Other things I'm having to change:

>>> from sideboard.lib import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/eli/testing/sideboard/sideboard/internal/imports.py", line 403, in _thread_safe_import_that_handles_plugin_virtualenvs
    return original_import(*args, **kwargs)
  File "<frozen importlib._bootstrap>", line 2258, in _handle_fromlist
TypeError: hasattr(): attribute name must be string
>>> 

which leads me to the actual comments in the actual Python source code:

# The hell that is fromlist ...

which I think even beats out "Man alive, I prefer unittest" as far as awesome comments.

Anyway, I need to get food for tomorrow and clean the kitchen before bed, so I think that's it for me for now. I'll pick this up on Wednesday and try to figure out what's going on. In general I'm optimistic, but any help would be appreciated.

ftobia commented 10 years ago

I can take a stab at Python3-ifying rpctools. Let me take tonight to size up how much work it looks like and try to make some progress.

It's not great that it doesn't have tests, but I think what that really means is that my confidence that I won't break things is relatively low.

EliAndrewC commented 10 years ago

Presumably you mean take TOMORROW night to size it up, since you've already RSVP'ed yes for gaming tonight!

ftobia commented 10 years ago

I was thinking of sitting out one of the longer games in favor of shorter games like No Thanks and Coup, and do a little coding during downtime. I do, however, respond to peer pressure. It's up to you what kind of social norms you want to enforce at your game night (e.g. is side-coding acceptable?).

EliAndrewC commented 10 years ago

Side coding sounds like an awesome thing to encourage, if for no other reason than that people who "have work to do" can still come and hang out.

EliAndrewC commented 10 years ago

I'm now running a Sideboard server with Python 3, which is a great sign. A few more things that I had to tweak to get this working, for anyone interested:

Anyway, I'm making good progress and expect to have this working in time to start trying to use it with the MAGFest code by the end of this coming weekend. It'd still be nice to have working ldap and rpctools, but since I probably won't use either of them for the MAGFest stuff, those parts could be commented out if necessary until they're working. That would stop me from developing MAGFest code against the master branch of Sideboard, so it'll be good to have them sooner rather than later, but it's not so urgent that anyone should lose any sleep over it.

robdennis commented 10 years ago

To get Unicode on 3 and bytestrings on 2 you just use an explicit str invocation

For example:   from future import unicode_literals all = map(str, ['my_unicode_literal'])

If you look at the configobj codebase we do some similar things

EliAndrewC commented 10 years ago

Haha, that's SO obvious in retrospect! Thanks, I'll change it to do that on Friday.

ftobia commented 10 years ago

I don't think that's the right way to do it. That file is using unicode literals. In Python 2, you don't want str(u'foo'), what you want is u'foo'.encode('ascii'). Doing it the first way relies on an implicit encoding using the default encoding.

EliAndrewC commented 10 years ago

Is that bad in this case? When dealing with user-provided input I totally agree that relying on the system default encoding is error-prone. But in this case we're converting a list of known values which are all legal language keywords. That should work regardless of whether their default encoding is UTF-8 or LATIN-1 or even something a little weirder, right?

robdennis commented 10 years ago

in my opinion, Frank's suggestion is "more correct" but not sufficiently more correct that I want the more complicated structure

ftobia commented 10 years ago

It's not that it won't work. It will work. It's less correct and it's a bad example. And more than anything, I dislike leaving a bad example for others to follow, especially with things like unicode handling in single-source Python2-and-3 code bases.

Also, I don't think that:

from __future__ import unicode_literals
__all__ = map(str, ['my_unicode_literal'])

is meaningfully less complicated than:

from __future__ import unicode_literals
__all__ = ['my_unicode_literal']
if six.is_python_2:  # whatever this constant is.
    __all__ = [s.encode('ascii') for s in __all__]

You can wrap it in a helper function if you don't want to do the same thing in every file.

EliAndrewC commented 10 years ago

Fair enough. I'll do it the "correct" way, if for no other reason than to encourage @ftobia to chime in with strong opinions when he has them :)

Side note: we cannot use map here because on Python 3 map doesn't actually return a list anymore, it returns some kind of custom iterator.

ftobia commented 10 years ago

Alternately, if you make a separate issue for "__all__ should be bytes on Python 2 but unicode on Python 3" then I'll have a pull request for it :)

robdennis commented 10 years ago

I feel like the if six.PY2 if statement in all the modules that need __all__ specified is meaningfully more complicated, but hiding it inside of a "get my all list of strings" helper method is fine.

my current take is that anything that needs to check for py2 / py3 needs to be in a helper method.

EliAndrewC commented 10 years ago

I've made the improvement suggested by @ftobia. Here are a few more random fixes:

Here's an interesting difference between Python 2 and Python 3 that I didn't know about: the **kwargs syntax requires that the keys of the kwargs dictionary be (unicode) strings. Normally you'd expect this to be the case anyway, but we used to be able to merge two dictionaries like this:

d1 = {1: 2}
d2 = dict({5: 6}, **d1)

But this no longer works in Python 3, since it expects all keys used in the **kwargs to be strings. I'll watch out for this in the future. So we'd either need to do an update or if we needed to do it inline we'd need to say

merged = dict(list(d1.items()) + list(d2.items()))

But I did it the long way since I find that one-liner to be really ugly.

Anyway, I've gone through all of the unit tests and gotten them to pass on Python 3. I've also gone through the tutorial and made sure that the sample ragnarok app works as we expect.

My next step is to re-run the unit tests on Python 2 to make sure that all of my supposedly cross-platform changes actually work on both Python 2 and Python 3. After that I'll make a pull request.

EliAndrewC commented 10 years ago

Oh yeah, I left out the most annoying one; Ubuntu uses a dist-packages directory in addition to site-packages, so I had to modify sideboard/internal/imports.py to check in both locations for Python 3.4 to work (although we only require that site-packages exist, since dist-packages won't be there on all platforms.

ftobia commented 10 years ago

rpctools uses three undocumented functions from Python 2's urllib: "splittype", "splithost", and "splituser". If only the implementation used something like urllib.urlparse then porting would be easy, but as-is I don't trust myself refactoring it in the absence of unit tests.

The rest of the porting is going fine though. Shouldn't be too long now.

ftobia commented 10 years ago

https://github.com/appliedsec/rpctools/pull/4

I just re-read your comment on splittype, splithost, etc. Please look over the code that I rewrote and make sure it lines up with what you did.

EliAndrewC commented 10 years ago

Alas, in trying to get the MAGFest code running as a Sideboard plugin, I've discovered that jus changing __import__ doesn't actually work for all cases in Python 3. In particular, dynamic imports which use importlib.import_module seem to not actually call the built-in __import__ function.

The solution is probably to keep the original code in an if six.PY2 block and use an actual import hook in the else block. I'll try to get this done tonight and update the pull request, since this actually stops Django from working for me.

EliAndrewC commented 10 years ago

So this is turning out to be extremely tricky. Theoretically it's really easy to write an import hook; I can just do this:

class ImportHook:
    def find_module(self, fullname, path=None):
        if _get_sideboard_plugin_where_import_originated():
            return self

    def load_module(self, name):
        plugin_name = _get_sideboard_plugin_where_import_originated()
        with use_plugin_virtualenv(plugin_name):
            return __import__(name, fromlist='*')

sys.meta_path.insert(0, ImportHook())

Unfortunately, there seems to be an underlying assumption built into the entire concept of import hooks that after you've imported something, it should be in sys.modules, which is reasonable but not actually true for what we want. Our goal is that if there's a function where someone says import django inside the function, then the import machinery returns the correct django module from the correct plugin's virtualenv without actually polluting sys.modules or sys.path afterwards.

I'm going to try to trace through what specific piece of code is relying on this postcondition to see if there's some way around it. There are some really kludgy solutions, like leaving the modifications around and undoing them the next time, something roughly equivalent to

class ImportHook:
    def find_module(self, fullname, path=None):
        self.undo_previous_sys_modification()
        if _get_sideboard_plugin_where_import_originated():
            return self

    def load_module(self, name):
        plugin_name = _get_sideboard_plugin_where_import_originated()
        modify_sys_path_and_modules(plugin_name)
        return __import__(name, fromlist='*')

sys.meta_path.insert(0, ImportHook())

Something like that would theoretically solve the problem, but I'm shuddering at how awful the idea of that is, so I'm hoping there's some other way.

Alas, this importing issue has caused me to miss my theoretical "deadline" on this, but I'll give it a few more days at least.

robdennis commented 10 years ago

is the sys.modules and sys.path still not-overwritable (e.g. on python 2, you couldn't ever do sys.modules = {})?

If not, perhaps you can overwrite it with a ChainMap and at least the modifications are saner.

robdennis commented 10 years ago

for what's it's worth the original python 2 implementation definitely did a lot of overwriting of sys.path/sys.modules and then cleaning it up afterwards: https://github.com/appliedsec/sideboard/blob/master/sideboard/internal/imports.py#L76

so it's hard to say that your second solution is really that much kludgier

EliAndrewC commented 10 years ago

Yeah, the reason why I feel icky about making changes and cleaning up afterwards is that when we've done that in the past, we've cleaned up immediately afterwards, like

Whereas now it would look more like this

I feel less terribly about that idea than I felt last night. But we'll need to make sure that all types of importing go through that undo step. Since Python 3 seems to have different code paths for different types of imports (e.g. some imports go through __import__ and others don't, unlike on Python 2), we'll need to be really careful about whatever approach we take, since otherwise we could leave sys.path and sys.modules in the wrong state when a different import happens, e.g. one plugin ends up using a different plugin's virtualenv.

Anyway, I'll try to look into this after work tonight. I've spent the past month or so scrambling to make enough Sideboard changes to get the system ready for MAGFest to use, and I really don't want to wait another year before using it on my own stuff.

Good idea about trying to actually replace sys.path and sys.modules BTW, if I get stuck on "doing it right" then I'll give that a shot and see if it works any better/differently than on Python 2.

EliAndrewC commented 10 years ago

So it looks like Python 3.4 actually changed a lot about how the import system works under the hood. The good news is that they added more flexibility. I'm still not sure whether I can write an import hook that will just work out of the box, since reading through _bootstrap.py which handles a lot of the import stuff has comments like

# The module must be in sys.modules at this point!

Anyway, I'm reading through PEP-451 (http://legacy.python.org/dev/peps/pep-0451/) and I'm going to try to do it right and use the new ModuleSpec system. I'll let you guys know how that goes.

(I've been thinking about what ou guys said about ditching the whole virtualenv-within-virtualenv thing, and I only want to do that if there's literally no way to do this in Python 3.x since we do have a real user story for it at work, and I don't feel comfortable ripping out a legitimate work-related use case to support a side-project use case.)