RuntimeError when using ManagedModel and TensorFlow

yohokuno commented 4 years ago

Hi!

Thanks to the simple API, I succeeded to batchfy my TensorFlow service using ThreadStreamer, but could not move on to ManagedModel because of RuntimeError at the end of this post.

The environment is:

service_streamer 0.1.2
flask 1.1.2
python 3.5.2
tensorflow-gpu 1.7.1
tensor2tensor 1.2.5
Docker image: nvidia/cuda:9.0-runtime-ubuntu16.04
Nivida Tesla V100 x4

Since I am not familiar with multiprocess programs, I might missing some background knowledge.

Any idea?

Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                              [427/1809]
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 115, in _main
    prepare(preparation_data)
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 226, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 278, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/usr/lib/python3.5/runpy.py", line 254, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/lib/python3.5/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/y-okuno/web/servicer/servicer.py", line 34, in <module>
    core = TranslateModel(FLAGS, FSQ_FLAGS=FSQ_FLAGS)
  File "/home/y-okuno/share/web_data/M688_service_streamer/servicer/translate.py", line 81, in __init__
    cuda_devices=(0, 1, 2, 3))
  File "/usr/local/lib/python3.5/dist-packages/service_streamer/service_streamer.py", line 267, in __init__
    self._setup_gpu_worker()
  File "/usr/local/lib/python3.5/dist-packages/service_streamer/service_streamer.py", line 280, in _setup_gpu_worker
    p.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 274, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_spawn_posix.py", line 33, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_spawn_posix.py", line 43, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 144, in get_preparation_data
    _check_not_importing_main()
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 137, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
StaticTranslation: No static translation items found. Filename: [/home/y-okuno/web/data/static_translations.tsv] Disabling function
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 115, in _main
    prepare(preparation_data)
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 226, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/usr/lib/python3.5/multiprocessing/spawn.py", line 278, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/usr/lib/python3.5/runpy.py", line 254, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/lib/python3.5/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/y-okuno/web/servicer/servicer.py", line 34, in <module>
    core = TranslateModel(FLAGS, FSQ_FLAGS=FSQ_FLAGS)
  File "/home/y-okuno/share/web_data/M688_service_streamer/servicer/translate.py", line 81, in __init__
    cuda_devices=(0, 1, 2, 3))
  File "/usr/local/lib/python3.5/dist-packages/service_streamer/service_streamer.py", line 267, in __init__

Meteorix commented 4 years ago

As the message suggested, just add

if __name__ == '__main__':
                freeze_support()
                ...

yohokuno commented 4 years ago

@Meteorix Thank you for your advice!

I've added the line from multiprocessing import freeze_support at the top of the main file, and freeze_support() at the beginning of main clause, but got the same error.

For clarity, my Flask object is defined outside main clause. Is this causing the problem?

yohokuno commented 4 years ago

Oh, the doc says freeze_support is only necessary to produce Windows executable, so it should matter in this case (I am running on Ubuntu 16.04 container).

multiprocessing.freeze_support() Add support for when a program which uses multiprocessing has been frozen to produce a Windows executable. (Has been tested with py2exe, PyInstaller and cx_Freeze.)

https://docs.python.org/3/library/multiprocessing.html

Meteorix commented 4 years ago

please try set mp_start_method="fork". It may because you init some global vars that cannot be duplicated when spawn a new process.

https://github.com/ShannonAI/service-streamer/blob/master/service_streamer/service_streamer.py#L258

yohokuno commented 4 years ago

@Meteorix Thank you for your advice, ManagedModel with Tensorflow is working in my app finally!

Beside using mp_start_method="fork" as you suggested, I needed to move import tensorflow after Streamer() to prevent initialization before CUDA_VISIBLE_DEVICES is set.

Another tweak I needed was to increase WORKER_TIMEOUT hardcoded in service_streamer.py. I made it configurable and opened Pull Request #76 - please check it out!

yohokuno commented 4 years ago

Closing this as the problem solved for me.

ShannonAI / service-streamer

RuntimeError when using ManagedModel and TensorFlow #71