Miserlou / Zappa

Serverless Python
https://blog.zappa.io/
MIT License
11.89k stars 1.2k forks source link

Detect Python Version of .pyc Files and Ignore if Incompatible #1356

Open mcrowson opened 6 years ago

mcrowson commented 6 years ago

Context

Zappa will package up .pyc files instead of the .py files if .pyc files are present. This can cause a ImportError: bad magic number if the .pyc files were compiled using a different Python version than is running on Lambda.

Expected Behavior

Zappa only packages up .pyc files if they are compatible with the python version specified in the "runtime" setting. If the .pyc file is incompatible with the desired runtime, package the .py file instead.

Actual Behavior

Zappa packages any .pyc file that exists instead of the .py file.

Possible Fix

Check the .pyc magic number against known values for the runtimes.

Getting the magic numberscd of a pyc file.

with open('somefile.pyc') as f:
    magic = f.read(2)
magic_int = struct.unpack("<H", magic)[0]

Known Magic Numbers: https://github.com/google/pytype/blob/master/pytype/pyc/magic.py

Steps to Reproduce

  1. zappa update with incompatible .pyc files
  2. get frustrated
Miserlou commented 6 years ago

TIL Python magic number!

mcrowson commented 6 years ago

Yea i had to learn the hard way looking at logs hah

jakul commented 6 years ago

This has caused some confusing bugs for me in the past when I have editably linked various local codebases together, but one of them was last executed by a Python 2 interpreter and the rest by a Python 3. It also happens ~twice a month for other developers in my organisation

rgov commented 6 years ago

Would it be preferable to not package up any existing .pyc files, but recreate them at packaging time with the correct version of the interpreter? You also don't know if any of the .pyc files are outdated, right?

I don't think it's desirable to couple Zappa to the internal implementation details of CPython's disk formats.

scoates commented 6 years ago

@rgov Aren't some packages distributed only as .pyc? Maybe not in practice, but I think that's one its selling features (can distribute opcode-compiled Python instead of the source).

rgov commented 6 years ago

I don't know of any packages distributed as .pyc, and if they were, it would be a poor choice: For one, as mentioned, they would only work with one specific version of the CPython interpreter (forget PyPy, etc.). Secondly, .pyc can easily be decompiled back into code, so it's not an effective obfuscation method.

But, if this is an actual case that people encounter, Zappa should probably do something, like warn that the .pyc could not be rebuilt because the file was not found.

scoates commented 6 years ago

I agree that a warning is in order. We outright disallow .pyc in our app, altogether:

# in the zappa settings callback

if not environ.get('PYTHONDONTWRITEBYTECODE'):
    puts(colored.red('You need to set PYTHONDONTWRITEBYTECODE for this project.'))
    raise MustSuppressWriteBytecode()

for root, dirnames, filenames in walk('.'):
    for filename in filenames:
        if filename.endswith('.pyc'):
            puts(colored.red('Found .pyc in this tree'))
            # no path here
            raise BytecodeUnclean('{} found in this tree; .pyc not allowed. Use PYTHONDONTWRITEBYTECODE.'.format(filename))
        if filename.endswith('.pyo'):
            puts(colored.red('Found .pyo in this tree'))
            # no path here
            raise BytecodeUnclean('{} found in this tree; .pyo not allowed. Use PYTHONDONTWRITEBYTECODE.'.format(filename))

However, having Zappa rebuild files gets too dangerous, in my opinion.

We disable .pyc and .pyo and enforce PYTHONDONTWRITEBYTECODE because this scenario has ruined our day, before:

Now, we don't actually have users doing zappa update … anymore. So this problem is at least partially solved, since the .pyc/.pyo files never get to the CI server, and they're in .gitignore, but in a scenario we did allow this (as many Zappa apps are manually deployed), the app is now in an unknown state (but it's still running, at least)—the person who did the deploy doesn't know that project/foo.pyc is being used, explicitly.

If Zappa removed project/foo.pyc (and was unable to rebuild it because project/foo.py is now gone), then the import would fail. This is good because this is probably what the developer expected, but it's really bad because if that import was in the app factory, or other critical bit of code, we now have a broken app, and we're down.

To be clear: I don't disagree with you that something needs to be done about .pyc/.pyo, but I don't necessarily think that Zappa doing magic with them is the right approach.

rgov commented 6 years ago

I think this is sensible:

Notice: Regenerating Python bytecode files for AWS Lambda environment
Error: Found foo.pyc but foo.py missing. Please backup and remove foo.pyc
    and re-test application before running Zappa again.
    See https://github.com/etc/wiki/blah for more details.

The wiki article can discuss the problem and offer alternative solutions, such as whitelisting the .pyc file, but only after the user has double checked that the code can be run on AWS Lambda.

That keeps the logic in Zappa pretty minimal, guarantees that all the .pyc files that get uploaded are built correctly for the AWS Lambda environment, and informs the user when an orphaned .pyc file might be affecting their project. No need to dissect file formats or try to be too clever about conditional regeneration.

mcrowson commented 6 years ago

We don't really care if the .py is missing, we just care if the .pyc is incompatible with Lambda right? I would think just testing each .pyc for magic numbers and then exiting with an error and a list of incompatible .pyc files would be sufficient no?

suriya commented 6 years ago

I see both .pyc and .py files being included in the package (with the current master branch). Earlier, only .pyc files would be included. Did something change?

scoates commented 6 years ago

@mcrowson is there a reason to not proceed with your proposal on Feb 5? I can likely write up a PR for this if we're in agreement that we should do this.

mcrowson commented 6 years ago

Sounds good!

scoates commented 6 years ago

So… this is kind of a pain because there are lots of old .pyc files kicking around.

@mcrowson: despite my concerns above, I now think the best approach is not to fail hard on an invalid .pyc and instead just exclude it from the archive.

This is still a problem if the original .py file doesn't exist, but hopefully this is rare.

It's also very difficult to test this with the current Zappa test layout (since all of the test apps use the whole repository tree).

I'll send a PR that logs a warning when an invalid .pyc is found, but just excludes it from the archive instead of hard failing.