BaguaSys / bagua-net

High performance NCCL plugin for Bagua.
https://bagua-tutorials.kwai-seattle.com/
MIT License
15 stars 4 forks source link

chore: initial baseline implementation #1

Closed shjwudp closed 3 years ago

todo[bot] commented 3 years ago

support user specified interfaces

https://github.com/BaguaSys/bagua-nccl-net/blob/ac15832291f7c8f25d28ae8e357d38f615cd298e/src/utils.rs#L33-L38


This comment was generated by todo based on a TODO comment in ac15832291f7c8f25d28ae8e357d38f615cd298e in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

fix this deadlock. tcp_stream must be nonblock, otherwise it may deadlock

https://github.com/BaguaSys/bagua-nccl-net/blob/3fd27713274e85cd8e48f55bf79f965085ebba00/src/bagua_net.rs#L151-L156


This comment was generated by todo based on a TODO comment in 3fd27713274e85cd8e48f55bf79f965085ebba00 in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

make isend true async

https://github.com/BaguaSys/bagua-nccl-net/blob/3fd27713274e85cd8e48f55bf79f965085ebba00/src/bagua_net.rs#L365-L370


This comment was generated by todo based on a TODO comment in 3fd27713274e85cd8e48f55bf79f965085ebba00 in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

make async read

https://github.com/BaguaSys/bagua-nccl-net/blob/3fd27713274e85cd8e48f55bf79f965085ebba00/src/bagua_net.rs#L445-L450


This comment was generated by todo based on a TODO comment in 3fd27713274e85cd8e48f55bf79f965085ebba00 in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

support parse sockaddr from NCCL_COMM_ID

https://github.com/BaguaSys/bagua-nccl-net/blob/3fd27713274e85cd8e48f55bf79f965085ebba00/src/utils.rs#L37-L42


This comment was generated by todo based on a TODO comment in 3fd27713274e85cd8e48f55bf79f965085ebba00 in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

@shjwudp: support parse sockaddr from NCCL_COMM_ID

https://github.com/BaguaSys/bagua-nccl-net/blob/d44bfc2c89c9200f19d98daf94367fe3e486eb2c/src/utils.rs#L37-L42


This comment was generated by todo based on a TODO comment in d44bfc2c89c9200f19d98daf94367fe3e486eb2c in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

make shutdown global

https://github.com/BaguaSys/bagua-nccl-net/blob/7f506837bfe9026f068c8befcc29068c8c8b6568/src/bagua_net.rs#L394-L399


This comment was generated by todo based on a TODO comment in 7f506837bfe9026f068c8befcc29068c8c8b6568 in #1. cc @BaguaSys.
todo[bot] commented 3 years ago

make Rotating communicator

https://github.com/BaguaSys/bagua-nccl-net/blob/dca7e6fe05757e41680c938840ee423ed8d4263f/src/bagua_net.rs#L57-L62


This comment was generated by todo based on a TODO comment in dca7e6fe05757e41680c938840ee423ed8d4263f in #1. cc @BaguaSys.
shjwudp commented 3 years ago

@NOBLES5E The benchmark data has been updated to the README. Can we make the baseline become the master?