Accenture / serverless-ephemeral

This is a Serverless Framework plugin that helps bundling any stateless zipped library to AWS Lambda.
67 stars 15 forks source link

serverless invoke local is failing for package tensorflow #13

Closed piercus closed 6 years ago

piercus commented 6 years ago

Hello,

When using tensorflow, serverless invoke local is failing

I think serverless invoke local is broken when using serverless-ephemeral

$ serverless invoke local --function <my-function> --path test/events/frame.json
Traceback (most recent call last):
  File "/home/pierre/.nvm/versions/node/v7.7.3/lib/node_modules/serverless/lib/plugins/aws/invokeLocal/invoke.py", line 57, in <module>
    module = import_module(args.handler_path.replace('/', '.'))
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module

    __import__(name)
  File "/home/pierre/dev/lbd-py-tensorflow/custom-function.py", line 5, in <module>
    import lib.classify as classify
  File "/home/pierre/dev/lbd-py-tensorflow/lib/__init__.py", line 1, in <module>
    from .network import Network
  File "/home/pierre/dev/lbd-py-tensorflow/lib/network.py", line 4, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

The library is missing libcudnn.so.6

I will suggest a PR for this too

alexleonescalera commented 6 years ago

Can you provide this info:

I'm trying to reproduce the exact issue in my local but I'm not being able to.

piercus commented 6 years ago

I think this issue might be related to my local tensorflow installation,

I'm using

I had to set up LD_LIBRARY_PATH to make tensorflow run on local machine (see details below), following the instructions from https://www.tensorflow.org/install/install_linux

NVIDIA requirements to run TensorFlow with GPU support

If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:

  • CUDA® Toolkit 8.0. For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the LD_LIBRARY_PATH environment variable as described in the NVIDIA documentation.

I'm not sure if every tensorflow installation needs this LD_LIBRARY_PATH set up.

@alexleonescalera have you add specific values to LD_LIBRARY_PATH on your local environment to make tensorflow run ? Or is it running without it ?

More info about my environment

nvidia-smi

➜  $ nvidia-smi
Wed Feb  7 09:20:17 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     Off  | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8    17W / 200W |    916MiB /  4030MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1326      G   /usr/lib/xorg/Xorg                           513MiB |
|    0      2254      G   compiz                                        62MiB |
|    0      3104      G   ...-token=<token>   291MiB |
|    0     17160      G   ...passed-by-fd --v8-snapshot-passed-by-fd    43MiB |
+-----------------------------------------------------------------------------+

LD_LIBRARY_PATH

$ echo $LD_LIBRARY_PATH
:/usr/local/cuda/lib64:/home/pierre/dev/cudnn/cuda/lib64
alexleonescalera commented 6 years ago

Unfortunately, due to time constraints, we didn't go too far with TensorFlow. As you can see, the only packager available is for CPU and Python 2.7. Added to that, our tests were run directly on AWS, not locally. Thus, this specific scenario needs more exploration to come to a more generic solution.

Meanwhile, referring to https://github.com/serverless/serverless/blob/4b71faf2128308894646940ce2fb64e826450972/lib/plugins/aws/invokeLocal/index.js#L93, the lambdaDefaultEnvVars is merged with providerEnvVars and functionEnvVars. Can you try setting the LD_LIBRARY_PATH in your serverless.yml in either the provider or the function env vars?

piercus commented 6 years ago

@alexleonescalera thank you for your support :-)

If you do not face this issue, i understand you do not want to merge this.

Can you try setting the LD_LIBRARY_PATH in your serverless.yml in either the provider or the function env vars?

It will work locally, but it may break then deployed version (which is worse than current situation :-( ) by overriding the amazonlinux LD_LIBRARY_PATH environment variable on the server and creates side-problems

I have created separated plugin https://github.com/piercus/serverless-local-environment.

I hope we would be able to discuss other issues soon :-)

Thank you for your feedback

alexleonescalera commented 6 years ago

The separate plugin looks like a better approach since you are addressing an issue that comes from the Serverless core code. I would recommend you contacting the Serverless team about this and requesting them to add your plugin to their list: https://github.com/serverless/plugins

I can see the benefit of flexible settings when running locally vs deployed, so your plugin might be a solution for other people as well.

Thanks for your collaboration.

piercus commented 6 years ago

Yes thank for you advice, i'm waiting for them to accept my PR on https://github.com/serverless/plugins/pull/129