Closed kikakkz closed 1 year ago
With the CUDA_VISIBLE_DEVICES
environment variable you can specify per program which GPUs should be visible. This way you can specify exactly which GPUs you would run it. Would this work for your use case instead of this patch?
With the
CUDA_VISIBLE_DEVICES
environment variable you can specify per program which GPUs should be visible. This way you can specify exactly which GPUs you would run it. Would this work for your use case instead of this patch?
Not exactly. With CUDA_VISIBLE_DEVICES the whole process can only see the specified device. So if we have a machine with multiple card, we have to run multiple processes. and it's even worse if we run PC1/PC2/C2 in one worker. We have to isolate memory, cpu threads, GPU device, nvme space for each process. That's really complicated for automatic deployment script. With this PR, the process can run one C2 task, or multiple C2 task concurrently according to GPU_PER_TASK value.
@kikakkz thanks for the explanation. To make sure I understand your use case correctly. What you want is basically to be able to say "x number of GPUs form one unit". So let's say you have 6 GPUs in a machine, you set GPU_PER_TASK=2
, then you kind of have 3 units of GPUs which can be used independently, but C2 would still use 2 GPUs.
Unrelated to the code, but still relevant: @kikakkz would it be possible for you to sign into CircleCI? I know it sounds weird, but this ways the CI would then be triggered correctly.
Unrelated to the code, but still relevant: @kikakkz would it be possible for you to sign into CircleCI? I know it sounds weird, but this ways the CI would then be triggered correctly.
i try, but i struggle to fail,hmm~ let me try again~
i try, but i struggle to fail,hmm~ let me try again~
Clearing the cookies might help (I got that information from CircleCI support, once there was a similar issue).
@kikakkz thanks for the explanation. To make sure I understand your use case correctly. What you want is basically to be able to say "x number of GPUs form one unit". So let's say you have 6 GPUs in a machine, you set
GPU_PER_TASK=2
, then you kind of have 3 units of GPUs which can be used independently, but C2 would still use 2 GPUs.
yes, exactly. actually in my test, dual GPUs for one C2 do not have so much promotion of performance compare to single GPU for one C2 task. so in my practice, i would like to use single GPU for one C2 task, and run multiple C2 tasks concurrently within one worker process. and of course if i have 6 GPUs, i can set GPU_PER_TASK to 2, then i can run 3 C2 tasks concurrently.
i try, but i struggle to fail,hmm~ let me try again~
Clearing the cookies might help (I got that information from CircleCI support, once there was a similar issue).
trying now, 😄
@vmx i revoke access, clear cache, clear cookie then re-login circle ci, but still fail. after i try to get fail issue, and try to get configuration file, i get the url like 'https://app.circleci.com/projects/github/filecoin-project/bellperson/config/?branchName=&pipelineNumber=1784' in which branchName is missed, then i just give branchName to be master i can get config.yml file of circleci. could you please help ?
and seems i cannot rerun the failed one because it miss all info in that record. should i recreate this PR and close this one ?
submit one more empty line and seems it triggered~ let's wait, 😄
CI still seems weird. When you work through the code review, you can try to create another PR and we'll see if that helps.
https://github.com/filecoin-project/bellperson/pull/300 create this one to test circle instead
original implementation lock bellman.gpu.lock to lock all GPUs in default. And in my test with cuda single 3080 for C2: 15m52s double 3080 for C2: 12m36s so i think it's better to let user to choice if they need to use all GPUs for one C2 task. a environment named GPU_PER_TASK is introduced to let user to set. if GPU_PER_TASK is not set, then we use all GPUs for one C2, just like current implementation; if GPU_PER_TASK == 0, just like above; if GPU_PER_TASK > 0, use GPU_PER_TASK GPUs (up to devices.len()) for one C2 task.