Closed mharradon closed 7 years ago
Have you installed NCCL? https://github.com/NVIDIA/nccl
I thought I would catch this by not being able to import pygpu
collectives, but apparently that's not the case, thanks!
I installed both packages here: https://github.com/NVIDIA/nccl/releases
Appears to be compiling now :D
I'm getting this error now on distribute():
Mapped name None to device cuda8: Tesla K80 (0000:00:17.0)
Mapped name None to device cuda15: Tesla K80 (0000:00:1E.0)
Mapped name None to device cuda7: Tesla K80 (0000:00:16.0)
Mapped name None to device cuda10: Tesla K80 (0000:00:19.0)
Mapped name None to device cuda14: Tesla K80 (0000:00:1D.0)
Mapped name None to device cuda13: Tesla K80 (0000:00:1C.0)
Mapped name None to device cuda2: Tesla K80 (0000:00:11.0)
Mapped name None to device cuda11: Tesla K80 (0000:00:1A.0)
Mapped name None to device cuda9: Tesla K80 (0000:00:18.0)
Mapped name None to device cuda5: Tesla K80 (0000:00:14.0)
Mapped name None to device cuda4: Tesla K80 (0000:00:13.0)
Mapped name None to device cuda3: Tesla K80 (0000:00:12.0)
Mapped name None to device cuda1: Tesla K80 (0000:00:10.0)
Mapped name None to device cuda0: Tesla K80 (0000:00:0F.0)
Mapped name None to device cuda6: Tesla K80 (0000:00:15.0)
Mapped name None to device cuda12: Tesla K80 (0000:00:1B.0)
Synkhronos: 16 GPUs initialized, master rank: 0
Dumped network architecture to network_desc.txt
Setting output nodes
Building function...
Synkhronos distributing functions...
Traceback (most recent call last):
File "./runMyCode.py", line 91, in <module>
run(BGA_params,train_params,opt_dict)
File "/home/ubuntu/MyCode.py", line 801, in run
BGA = AAR(**BGA_params)
File "/home/ubuntu/MyCode.py", line 224, in __init__
self.train_fn = self.build_train_fn(loss,losses)
File "/home/ubuntu/MyCode.py", line 599, in build_train_fn
synk.distribute()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/synkhronos/function_builder.py", line 134, in distribute
with open(PKL_FILE, "wb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/anaconda3/lib/python3.6/site-packages/synkhronos/pkl/synk_f_dump_76722.pkl'
The pkl directory did not exist, so I created it and am running again now.
Right! Have just created this directory with dummy file in it. Thanks!
I changed PKL_PATH to a directory in /dev/shm to get a little performance boost - also my box was running out of primary disk š . Since you've already got unixy dependencies maybe that might be a better default? I guess the user could change it with an env var or some config file.
My functions take a long time to build, so the debug cycle is slow, but almost there I think!
To speed up Theano compilation while developing, you can use this flag:
optimizer=fast_compile
It won't have stability optimization to have them, use optimizer=stabilize
The execution speed will be slower, but compilation will be faster.
On Tue, May 2, 2017 at 3:46 PM Michael Harradon notifications@github.com wrote:
I changed PKL_PATH to a directory in /dev/shm to get a little performance boost - also my box was running out of primary disk š . Since you've already got unixy dependencies maybe that might be a better default? I guess the user could change it with an env var or some config file.
My functions take a long time to build, so the debug cycle is slow, but almost there I think!
ā You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/astooke/synkhronos/issues/2#issuecomment-298740653, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-6OtHfSux8qCthvo3acBKfoTbK-kks5r14f3gaJpZM4NOcS4 .
Good point nouiz, I always forget to do that when debugging!
I think I've solved the original issues here - I'll open up another issue for anything else for posterity. I'll see how far I can get!
Good idea to allow a smarter pickle path...it really can be anywhere with read/write privilege.
Would you mind saying how long it takes to distribute the functions vs compiling them in the first place? Pickling happens very fast, but it takes a while to unpickle because all the workers are fighting for the compile lock. Ideally we could have the workers do less work on the function when unpickling...something I'll bring up with @nouiz :)
Are your functions carrying large amounts of data in shared variables? I've also thought about using the pickling mode which does not store any function data, and then just using the broadcast functionality already here to get all the workers to initialize with the same shared variable values. So far my functions have only carried a few MB of shared data, but if it's more like GB maybe this is worth doing.
I think right now I'm around something like 20 minutes compile (with fast_run), 10 minutes distribute with pickle in /dev/shm. My model is about 1 GB, but it's not a huge bottleneck for me right now.
pygpu.test() passes, code runs fine on cuda0 without synkhronos.
Using 'device=cpu,force_device=True' in THEANO_FLAGS.
Thanks for any tips!