Support to lock GPUs for each task according to BELLPERSON_GPUS_PER_LOCK environment (circleci of #298 not work properly, so i re-create this one)

kikakkz commented 1 year ago

original implementation lock bellman.gpu.lock to lock all GPUs in default. And in my test with cuda single 3080 for C2: 15m52s dual 3080 for C2: 12m36s so i think it's better to let user to choice if they need to use all GPUs for one C2 task. a environment named BELLPERSON_GPUS_PER_LOCK is introduced to let user to set. if BELLPERSON_GPUS_PER_LOCK is not set, then we use all GPUs for one C2, just like current implementation; if BELLPERSON_GPUS_PER_LOCK == 0, just like above; if BELLPERSON_GPUS_PER_LOCK > 0, use BELLPERSON_GPUS_PER_LOCK GPUs (up to devices.len()) for one C2 task.

https://github.com/filecoin-project/bellperson/pull/298: circleci not work properly, so i re-create this one.

kikakkz commented 1 year ago

@vmx i would like to compile and test the src/gpu/lock.rs, is there test for that ? i didn't find it~

vmx commented 1 year ago

There are no specific tests for the locking. It would be cool to have some, but I think it's kind of hard to do as it is so tightly coupled with the devices. If you find a way that would be great though.

In order to compile you can run cargo build --features opencl or cargo build --features cuda. opencl will have a shorter compile time, as compiling CUDA kernels takes quite a while. You can also run the usual tests with cargo test --features opencl.

kikakkz commented 1 year ago

There are no specific tests for the locking. It would be cool to have some, but I think it's kind of hard to do as it is so tightly coupled with the devices. If you find a way that would be great though.

In order to compile you can run cargo build --features opencl or cargo build --features cuda. opencl will have a shorter compile time, as compiling CUDA kernels takes quite a while. You can also run the usual tests with cargo test --features opencl.

ok. i run it to fix compile error firstly. i write the code based on 0.22.1, and now i should merge to master.

kikakkz commented 1 year ago

i think my circleci account has some limitation. i never use it before so my account is new account to circleci, we run our ci/cd with github action and internal jenkins.

vmx commented 1 year ago

@kikakkz in regards to the CI: This is what I got from CircleCI support:

This user would need to stop following the project on CircleCI in order for the builds to run under filecoin-project.

I'm not sure what "following" means here, but I guess it's some button in CircleCI. If you cannot find it, I'll ask the support for more details.

kikakkz commented 1 year ago

@kikakkz in regards to the CI: This is what I got from CircleCI support:

This user would need to stop following the project on CircleCI in order for the builds to run under filecoin-project.

I'm not sure what "following" means here, but I guess it's some button in CircleCI. If you cannot find it, I'll ask the support for more details.

wow, seems work, let's wait

vmx commented 1 year ago

I was thinking about this PR a bit more. It creates a lock file for every GPU, even if you don't set the env var. This certainly changes the current behaviour. I'm thinking of cases like people having scripts, that delete the lock file in case the process crashed and left one laying around.

What about if we make it so that if the env var isn't set, it creates only a single lock file with the same name as it currently has. One way could be to have an empty vector of locks to signal that (though perhaps you've a better idea).

Another thought I had was, that I thought about having an option to remove the lock completely. Perhaps that could be integrated as well. If you set the env var to 0, it could signal "i don't want any locks", which kind of makes sense to me.

What do you think? I also want to make sure I don't scare you away. I'm so happy to see contributions like yours. I don't want to give you the feeling of "I just wanted to make this well scoped change and now they ask me for doing this other thing I've never intended to do".

kikakkz commented 1 year ago

I was thinking about this PR a bit more. It creates a lock file for every GPU, even if you don't set the env var. This certainly changes the current behaviour. I'm thinking of cases like people having scripts, that delete the lock file in case the process crashed and left one laying around.

What about if we make it so that if the env var isn't set, it creates only a single lock file with the same name as it currently has. One way could be to have an empty vector of locks to signal that (though perhaps you've a better idea).

Another thought I had was, that I thought about having an option to remove the lock completely. Perhaps that could be integrated as well. If you set the env var to 0, it could signal "i don't want any locks", which kind of makes sense to me.

What do you think? I also want to make sure I don't scare you away. I'm so happy to see contributions like yours. I don't want to give you the feeling of "I just wanted to make this well scoped change and now they ask me for doing this other thing I've never intended to do".

no worry, 😄. worth consideration and i like your way of discussion. i need to go out now, i'll check your opinion tomorrow, 😄.

vmx commented 1 year ago

@kikakkz in regards to the tuple, you mentioned that you couldn't get the struct working. Here's a commit that does that: https://github.com/filecoin-project/bellperson/commit/fa02bf5457289794ca70c3ab5863fcc833894d7f. Feel free to cherry-pick or take pieces of it.

kikakkz commented 1 year ago

I was thinking about this PR a bit more. It creates a lock file for every GPU, even if you don't set the env var. This certainly changes the current behaviour. I'm thinking of cases like people having scripts, that delete the lock file in case the process crashed and left one laying around.

What about if we make it so that if the env var isn't set, it creates only a single lock file with the same name as it currently has. One way could be to have an empty vector of locks to signal that (though perhaps you've a better idea).

I think that's reasonable to be compatible exist behaviour.

Another thought I had was, that I thought about having an option to remove the lock completely. Perhaps that could be integrated as well. If you set the env var to 0, it could signal "i don't want any locks", which kind of makes sense to me.

In that case do you mean user can use one GPU to run multiple task if it's what they want ?

What do you think? I also want to make sure I don't scare you away. I'm so happy to see contributions like yours. I don't want to give you the feeling of "I just wanted to make this well scoped change and now they ask me for doing this other thing I've never intended to do".

No worry about that. I'm a rust beginner actually. I have some experience about c/c++/golang, but for rust actually I still in heavy study. I learn a lot from the process of the discussion and the process of making a draft idea to be implemented step by step, and the code is improved more and more beautiful. I also enjoy this process.

vmx commented 1 year ago

In that case do you mean user can use one GPU to run multiple task if it's what they want ?

Yes. Though the user would be on their own. The kernels expect that they are the only thing that runs on the kernel at a time (hence the lock). So someone might implement some sort of scheduling themselves, where they make sure always only one kernel is run at a time on a GPU, then they could use that feature. E.g. lotus-miner could potentially make use of that.

kikakkz commented 1 year ago

In that case do you mean user can use one GPU to run multiple task if it's what they want ?

Yes. Though the user would be on their own. The kernels expect that they are the only thing that runs on the kernel at a time (hence the lock). So someone might implement some sort of scheduling themselves, where they make sure always only one kernel is run at a time on a GPU, then they could use that feature. E.g. lotus-miner could potentially make use of that.

Sounds great. Just one concern: will that introduce some confusing of the meaning of BELLPERSON_GPUS_PER_LOCK ? For me it's a bit obscure of the variable's meaning in this case. But generally I think of it's worth to provide such convenience to user.

vmx commented 1 year ago

will that introduce some confusing of the meaning of BELLPERSON_GPUS_PER_LOCK ?

If documented, it would make sense to me: If you set BELLPERSON_GPUS_PER_LOCK to 0, it means that every lock has zero GPUs => no GPU has a lock. Though I'm totally open for a better name, it's just what came to my mind.

kikakkz commented 1 year ago

will that introduce some confusing of the meaning of BELLPERSON_GPUS_PER_LOCK ?

If documented, it would make sense to me: If you set BELLPERSON_GPUS_PER_LOCK to 0, it means that every lock has zero GPUs => no GPU has a lock. Though I'm totally open for a better name, it's just what came to my mind.

Deal. I will implement that firstly then try to document it.

kikakkz commented 1 year ago

@vmx I try reimplement the lock function with your suggestion. For the fallback to single lock mode, and lock free mode, I modify the LockInfo struct as following

#[derive(Debug)]                                                                                                                            
struct LockInfo<'a> {                                                                                                                       
    file: Option<File>,                                                                                                                     
    path: Option<PathBuf>,                                                                                                                  
    devices: Vec<&'a Device>,                                                                                                               
}

Then if we're in single lock mode, we just put all devices to the Vec, and GPULock only contain one member in its Vec. If we're in lock free mode, of cause file and path will be None and we just put all devices to the Vec of LockInfo. But I have some difficulties to simplify the following part code and think of you may provide some help about that.

let mut devices = Vec::new();                                                                                                           
for lock_info in &lock.0 {                                                                                                              
    for device in &lock_info.devices {                                                                                                  
        devices.push(*device);                                                                                                          
    }                                                                                                                                   
}                                                                                                                                       

/*                                                                                                                                      
let devices = lock                                                                                                                      
    .0                                                                                                                                  
    .iter()                                                                                                                             
    .map(|LockInfo { devices, .. }| devices)                                                                                            
    .collect::<Vec<&Device>>();                                                                                                         
*/

vmx commented 1 year ago

Alright, I've created a PR https://github.com/kikakkz/bellperson/pull/1 as I thought it would be easier then explaining every detail here. I've spent some time on it and I think my initial intuition about having an empty vector of locks for signaling that there are no locks turned out quite nice.

From my commit message:

With this change the three different modes are signaled the following way:

No locks: the LockInfo vector is empty

Single lock for all GPUs: the LockInfo vector contains a single element, where the device is set to None

A lock across several GPUs: The LockInfo vector contains a lock per GPU, the device is set.

This makes file and path no longer optional, it's kind of going back to what we had ;)

I think with the comments it's clearer now which modes exist. I didn't like to have vector of devices, only for the "all devices" use case. I hope that things a clearer now. Please let me know if you like this change.

kikakkz commented 1 year ago

I think with the comments it's clearer now which modes exist. I didn't like to have vector of devices, only for the "all devices" use case. I hope that things a clearer now. Please let me know if you like this change.

really nice. i merge it. thank you so much~

vmx commented 1 year ago

Thanks @kikakkz so far! I'v added @cryptonemo for review as I think that it's now in a state, where others can have a final review.

kikakkz commented 1 year ago

Thanks @kikakkz so far! I'v added @cryptonemo for review as I think that it's now in a state, where others can have a final review.

@vmx thanks to your kindly help.

filecoin-project / bellperson

Support to lock GPUs for each task according to BELLPERSON_GPUS_PER_LOCK environment (circleci of #298 not work properly, so i re-create this one) #300