Closed Jackjiayou closed 1 year ago
Hi @Jackjiayou ,
Yes, the instance with the same machine
value has run on the same machine; if their start_time -- end_time has some overlap, they have share the machine for a while.
Regarding the GPU sharing, it further requests the gpu_name
to be the same value on that machine. The gpu_name
can be retrieved from table pai_sensor_table
, the following entries are for your reference:
In [10]: dfs[dfs.inst_id==inst_id_4644][['worker_name','machine','gpu_name']].sort_values('machine')
Out[10]:
worker_name machine gpu_name
346867 68557bf274d2d0ad2ad97b1c73c223082bdb377411a6f1... 07a757904c2974820f7f9dce /dev/nvidia3
346871 9ecd28b62b77cf3c52a9ab4218f18f237f7741a91c9678... 081398694cba03a36ebd1280 /dev/nvidia0
346856 787cace50526cef10c985fac21f27800a0fd8e10395519... 0a77ce47d2dc5f1a13fa9075 /dev/nvidia1
346869 c92b1fadc19f7df8c390807e3e772df030acf8d32c3f26... 1276c88236bd5b94e9d0021a /dev/nvidia5
346854 e41c40c968860d4fcb5e81470ecf3b5ef2804a89df1e32... 12bcc4fceea93a30d7d0f324 /dev/nvidia5
346870 8953bcfaa1ae98467552e54d45c369f98aaeb3e811b036... 1465a37f156f80e0687d8fff /dev/nvidia2
346841 87c15f2fc929e1d6e5234adbd179b83895e8698838ec28... 16b3cec68193e8b041dcd447 /dev/nvidia5
346864 b69f3fa916aace7329a530e5f64b9cb07a4e12db8a2869... 2031d5d4fcebfbd4fbab58c9 /dev/nvidia2
346843 554f4f40e33a8332c8a17ebc2f55bae4c99c5ec6da69bd... 2daafba3a48984f15d4f1325 /dev/nvidia1
......
@qzweng 请问PAI数据中task 或者instance的执行有先后顺序吗,task或者instance之间是否有依赖关系?
请问如图inst_id = 46442990f9c5da07bb4c399cb5e4e8ab3372ec4c995eccd8063af98a9ef6的数据,这种情况是论文中说的GPU共享吗?