celery / billiard

Multiprocessing Pool Extensions
Other
417 stars 252 forks source link

Abnormal task termination #236

Open jmdacruz opened 7 years ago

jmdacruz commented 7 years ago

Using Celery 3.1.25 with billiard 3.3.0.23 (with Redis 4.0.2) and running a stress test that sends ~30000 tasks (each of them consuming CPU for 100ms with a simple multiplication operation and returning) to be executed asynchronously without waiting for results, I regularly see at least 2 or 3 tasks that fail with an exception in billiard:

Traceback (most recent call last):
  ...
  File "/application/virtualenv/lib/python2.7/site-packages/Stressy/Stressy.py", line 21, in execute
    value = self.stress(5, 0.1)
  File "/application/virtualenv/lib/python2.7/site-packages/Stressy/Stressy.py", line 36, in stress
    value = x*x
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/common.py", line 95, in _shutdown_cleanup
    sys.exit(-(256 - signum))
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/pool.py", line 286, in exit
    return _exit()
SystemExit

The error rate is still incredibly low (less than 0.01%), but I wonder if this could be avoided altogether.

jmdacruz commented 7 years ago

I can confirm that this issue is not present when using Celery 4.1.0 with billiard 3.5.0.3. What would be the latest billiard version that is compatible with Celery 3.1.25?

auvipy commented 7 years ago

you could first try latest billiard release first with celery 3.1.x if that doesn't work try prior versions thanks.

auvipy commented 7 years ago

https://github.com/celery/billiard/issues/214

jmdacruz commented 7 years ago

@auvipy I was actually able to reproduce it with Celery 4.1.0 and billiard 3.5.0.3. This is what I did:

Traceback (most recent call last):
...
  File "/application/virtualenv/lib/python2.7/site-packages/jsonmerge/__init__.py", line 270, in merge
    return walk.descend(schema, base, head, meta).val
  File "/application/virtualenv/lib/python2.7/site-packages/jsonmerge/__init__.py", line 42, in descend
    log.debug("descend: %sschema %s" % (self._indent(), schema.ref,))
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1155, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1286, in _log
    self.handle(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1296, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1336, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 759, in handle
    self.emit(record)
  File "/usr/local/lib/python2.7/logging/handlers.py", line 430, in emit
    logging.FileHandler.emit(self, record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 957, in emit
    StreamHandler.emit(self, record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 885, in emit
    self.flush()
  File "/usr/local/lib/python2.7/logging/__init__.py", line 845, in flush
    self.stream.flush()
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/common.py", line 125, in _shutdown_cleanup
    sys.exit(-(256 - signum))
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/pool.py", line 280, in exit
    return _exit()
SystemExit
auvipy commented 7 years ago

what does master branch reproduce?

jmdacruz commented 7 years ago

Should I try just with billiard’s master? Or both billiard’s and celery’s?

thedrow commented 7 years ago

Please try with master versions of celery and billiard.

sposs commented 6 years ago

I seem to have the same 'error' when doing a warm shutdown of celery (under supervisor, setting stopasgroup=true) while running a gdal process.

[2018-05-17 13:15:41,836: ERROR/ForkPoolWorker-2] tile.tasks.handle_weather_images[9609a901-29f5-4ddf-ba71-a09bde4319d0]: <built-in function Open> returned a result with an error set
Traceback (most recent call last):
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/osgeo/gdal.py", line 1674, in <lambda>
    __setattr__ = lambda self, name, value: _swig_setattr(self, Dataset, name, value)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/billiard/common.py", line 125, in _shutdown_cleanup
    sys.exit(-(256 - signum))
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/billiard/pool.py", line 280, in exit
    return _exit()
SystemExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/weather_utils.py", line 165, in make_tiles
    if gdal_retile.main(cmd.split()):
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 881, in main
    build_pyramid(minfo, ds_created_tile_index, TileWidth, TileHeight)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 589, in build_pyramid
    input_ds = build_pyramid_level(level_mosaic_info, level_output_tile_info, level)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 612, in build_pyramid_level
    create_pyramid_tile(level_mosaic_info, offset_x, offset_y, width, height, tilename, OGRDS)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 404, in create_pyramid_tile
    dec.ulx + width * dec.scaleX, dec.uly)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 225, in get_data_set
    source_ds = self.cache.get(feature_name)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 90, in get
    result = gdal.Open(name)
SystemError: <built-in function Open> returned a result with an error set

I use celery 4.1.0 and billiard 3.5.0.3. I would like to let the process continue (I put a long stopwaittime) but it seems it gets kills right away. I'm not sure if I can 'protect' the gdal code (it's the gdal_retile.py), and how... Any idea/suggestion?

tirupathiraop commented 4 years ago

I am also facing the same issue when using the Redis with celery. @jmdacruz did you find any solution for this?