Open captainvera opened 5 years ago
Whether it is a future work for you to implement multi-GPU training with DistributedDataParallel function.
I'm sorry, I'm not sure if I understood your comment @BUAAers. Are you asking for us to implement multi-GPU with DistributedDataParallel instead/in addition to implementing with the local DataParallel class?
Hi guys, Just starting to get hands-on with your toolkit. Multi GPU training would surely be useful to speed up experiments. Is this under active development on your side? Thanks!
Hi @francoishernandez,
Yes, we paused development of this feature for a while, but as of right now it is under active development on our side. We are making some structural changes to the Kiwi framework (mostly data-loading and how we handle that, see additional context of original issue [spoiler: it was a huge bottleneck]) and intend to bump the version shortly to include both these changes and add Multi-gpu training. Stay tuned!
Meanwhile, PRs are very welcome, thanks for your fix!
Update on this.
Our initial impression was that we'd get Multi-gpu for "free" by adopting Pytorch-Lightning. Unfortunately while it does make it much easier to support. There are issues with metrics calculation (calculating corpus-level metrics when data is separated through the gpus) which made it harder to implement.
This is not a priority for us and has been paused from our side.
In case anyone would like to take a stab a it, feel free to comment on this issue as I have made some (non-public) progress on making it work with kiwi >=2.0.0.
Is your feature request related to a problem? Please describe. Currently, there is no solution to training QE models using multiple GPUs which can significantly speed up training of large models. There are have been several issues of people requesting this feature. (#31 #29)
Describe the solution you'd like Ideally, it should be possible to pass several GPU IDs to the
gpu-id
yams flag. OpenKiwi should use all of them in parallel to train the model.Additional context An important thing to take into account is that other parts of the pipeline might become a bottleneck when using multiple GPUS. Things like data injestion/tokenisation, etc