flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

How to set device GPU to CPU each model separately? #955

Closed DongChanS closed 3 years ago

DongChanS commented 3 years ago

Question

I want to run wav2letter decoder with Transformer AM (Encoder + Decoder) and ConvLM

Since cuda memory occupation of three models is large, i want to reduce memory occupation by following way.

How to achieve this?

Screenshot from 2021-02-24 18-52-26

If it is impossible, please you recommend alternative way to reduce memory occupation!

tlikhomanenko commented 3 years ago

We already support this scheme: you need to call Test binary first with saving emissions (this is AM-encoder, however here you will need disk space) and then you run Decoder binary with providing path to saved emissions.

This should be more effective (run first AM-encoder for all data, then do decoding) because you avoid expensive CPU - GPU transfer.

DongChanS commented 3 years ago

thanks!, but i have a question.

If i use this code (in legacy Test.cpp), maybe AM-encoder (network) and AM-decoder (criterion) are loaded to GPU concurrently.

af::setDevice(0);
W2lSerializer::load(FLAGS_am, cfg, network, criterion);

So, i want to conduct following logins. (for running first AM-encoder for all data, then do decoding)

  1. load AM-encoder and AM-decoder into CPU
  2. transfer AM-encoder to GPU
  3. after run AM-encoder, transfer AM-encoder to CPU and AM-decoder to GPU
  4. load LM into GPU and decode emissions

Those are possible?

light42 commented 3 years ago

thanks!, but i have a question.

If i use this code (in legacy Test.cpp), maybe AM-encoder (network) and AM-decoder (criterion) are loaded to GPU concurrently.

af::setDevice(0);
W2lSerializer::load(FLAGS_am, cfg, network, criterion);

So, i want to conduct following logins. (for running first AM-encoder for all data, then do decoding)

1. load AM-encoder and AM-decoder into CPU

2. transfer AM-encoder to GPU

3. after run AM-encoder, transfer AM-encoder to CPU and AM-decoder to GPU

4. load LM into GPU and decode emissions

Those are possible?

af::setDevice() is GPU only? I could run legacy Test.cpp on CPU mode.

tlikhomanenko commented 3 years ago

thanks!, but i have a question.

If i use this code (in legacy Test.cpp), maybe AM-encoder (network) and AM-decoder (criterion) are loaded to GPU concurrently.

af::setDevice(0);
W2lSerializer::load(FLAGS_am, cfg, network, criterion);

Here you can remove criterion to avoid its loading if you don't use it. Or do

af::setDevice(0);
W2lSerializer::load(FLAGS_am, cfg, network, criteriontmp);
// remove here criteriontmp
af::setDevice(1);
W2lSerializer::load(FLAGS_am, cfg, networktmp, criterion);
// remove here networktmp

So, i want to conduct following logins. (for running first AM-encoder for all data, then do decoding)

  1. load AM-encoder and AM-decoder into CPU
  2. transfer AM-encoder to GPU
  3. after run AM-encoder, transfer AM-encoder to CPU and AM-decoder to GPU
  4. load LM into GPU and decode emissions

Those are possible?

Exactly this is not supported for now (we don't support cpu and gpu array at the same time, work on this is in progress).

So you can do:

Does it look good for you?

af::setDevice() is GPU only? I could run legacy Test.cpp on CPU mode. Yes, only GPU.

DongChanS commented 3 years ago

Good! According to your suggestion, i should load network and criterion twice..

So, i want to reduce I/O bottleneck. Can i remove optimizer in the AM checkpoint file? ( I guess optimizer loading time is a little bit large. )

Or.. can i separate AM-encoder from AM?

tlikhomanenko commented 3 years ago

Yep, you can create separate main to load model and save only part of what you need to one file and another things to another file. See Train.cpp/Test.cpp how we save/load things.

So you can do

W2lSerializer::load(FLAGS_am, cfg, network, criterion);
W2lSerializer::save(fileAMencoder, network);
W2lSerializer::save(fileCrit, criterion);

And do loading when it is necessary.

light42 commented 3 years ago

Yep, you can create separate main to load model and save only part of what you need to one file and another things to another file. See Train.cpp/Test.cpp how we save/load things.

So you can do

W2lSerializer::load(FLAGS_am, cfg, network, criterion);
W2lSerializer::save(fileAMencoder, network);
W2lSerializer::save(fileCrit, criterion);

And do loading when it is necessary.

I see that W2LSerializer::load could only be used in GPU mode. Is there an example for audio transcription source code exclusively on CPU? I've tried using GPU and it worked flawlessly, but lately the prices of GPU are skyrocketing so I want to know whether wav2letter-cpu could be a viable option for cloud service.

tlikhomanenko commented 3 years ago

You need just to build arrayfire, fl and w2v with cpu backend and then train/test/decode binaries will work on cpu

light42 commented 3 years ago

You need just to build arrayfire, fl and w2v with cpu backend and then train/test/decode binaries will work on cpu

Okay got it. I'm sorry for asking such trivial question.

tlikhomanenko commented 3 years ago

No problem! Not comfortable thing for now - we cannot mix gpu and cpu in the same code. So either you compile everything with cpu backend and run on cpu, or you compile with cuda and run on gpu. Still you can do host transfer from gpu to cpu, but then you operate not with af::array and not with fl::Variable, just with std::vector.

Please also use latest fl in case you have model trained on GPU and then you want to run test/decode on purely CPU, because in latest fl we switched to onednn and fixed a bunch of issues for CPU backend, so now CPU forward will give you exactly the same as GPU forward.

DongChanS commented 3 years ago

Thanks! I have one problem..

load AM-encoder on GPU run AM-encoder to get emissions. Transfer them on CPU (getting std::vector). remove AM-encoder from GPU load AM-decoder and LM on GPU run decoding with

Could you explain how to remove AM-encoder from GPU?

I tried to run following code, but it not works.

delete network.get()

Should i remove all of parameters in the network (std::shared_ptr<fl::Module> object) like this?

for (auto param : network->params()){
    float * outputValue = param.array().device<float>();
    af::free(outputValue);
  }
tlikhomanenko commented 3 years ago

See here how we do it: https://github.com/facebookresearch/flashlight/blob/master/flashlight/app/asr/Decode.cpp#L448-L452

DongChanS commented 3 years ago

Thanks a lot! My problem is solved well.