dask / knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
http://knit.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
53 stars 10 forks source link

Container naming assumptions in ``yarn_api`` seem incorrect #119

Closed superbobry closed 6 years ago

superbobry commented 6 years ago

According to ContainerId API docs, its string representation is constructed according to the following format:

[...] container_e*epoch*_*clusterTimestamp*_*appId*_*attemptId*_*containerId* when epoch is larger than 0 (e.g. container_e17_1410901177871_0001_01_000005). epoch is increased when RM restarts or fails over. When epoch is 0, epoch is omitted (e.g. container_1410901177871_0001_01_000005).

https://github.com/dask/knit/blob/24a5e33ab5d6b4235cadf79f7e4244eb9935e9b4/knit/yarn_api.py#L130

It seems that the current version of the code ignores the fact that epoch can be non-zero and therefore could return an empty list of containers, even though there're running containers for a given application.

martindurant commented 6 years ago

Thank you @superbobry for your two issues. Would you be interested in submitting a PR to fix these, ideally with tests that show the fix?

superbobry commented 6 years ago

Surely, will do.

jcrist commented 6 years ago

Knit is being superseded by Skein (https://github.com/jcrist/skein). The new library is much more resilient to different hadoop configurations, and more flexible for deploying custom applications. If your intent is to deploy dask on yarn, dask-yarn (http://dask-yarn.readthedocs.io/) has been rewritten to use skein instead. Closing.