Closed yoshitomo-matsubara closed 3 years ago
Hi, thanks for the report.
DistributedDataParallel
might be supported later, but I don't have a lot of cycles to work on this right now.
I can not reproduce the DataParallel
crash, can you share your python/pytorch versions? (looks like python3.6 and torch 1.7.0 but just to confirmed). If you comment the 2 lines you won't be using multiple gpu.
We might release the package on compressai later, but again no eta on this.
Hi @jbegaint ,
I was using Python 3.6.9 and torch==1.7.1 on a machine with 3 GPUs.
Yes, I understand commenting the 2 lines out does not use multiple GPUs, but just wanted to show that it worked on a single GPU, but not on multiple ones.
(confirmed torch.cuda.device_count()
returns 3)
Looking forward to the support for DistributedDataParallel
and Python package release!
thanks for the information! I can't reproduce the DataParallel
issue. Can you make sure you have the latest version of compressai installed?
You're right, I was using ver. b22bda1f4b9cf61e154ecabd019eb1935cf00822
locally (including example.train.py), but somehow an old version remained in my virtual environment. Reinstalling it resolved the issue with DataParallel
on multiple GPUs.
ok great!
keeping this open for the DistributedDataParallel
support.
Thank you!
I'll be looking forward to the DistributedDataParallel
support as it will help us save training time significantly.
@yoshitomo-matsubara I've uploaded some test wheels (for linux and macos) on pypi.org, let me know how it goes.
@jbegaint It is working well on my machine (Ubuntu 20.04 LTS) and very helpful for me, thank you for publishing it!
As you might know, since this repo is in public, you can automate the process triggered by GitHub release (and so on) to publish Python package, which will lower barrier to publish/manage Python package on pypi.
Great :-). Yes, I've setup an automated github action to build and publish the wheels on push tags.
When will the code be supported DistributedDataParallel?? looking forward ~
Hi, I don;t have an ETA for this yet.
We'll revisit DDP support at a later date.
I agree that DDP support would be great !
First of all, thank you for the great package!
1. Support DistributedDataParallel and DataParallel
I'm working on large-scale experiments that takes pretty long for training, and wondering if this framework can support
DataParallel
andDistributedDataParallel
.The current example/train.py looks like supporting
Dataparallel
asCustomDataParallel
, but returned the following error(
pipenv run python examples/train.py --data ./dataset/ --batch-size 4 --cuda
on a machine with 3 GPUs)When commenting out these two lines https://github.com/InterDigitalInc/CompressAI/blob/master/examples/train.py#L333-L334 , it looks working well
Could you please fix the issue and also support
DistributedDataParallel
? If you need more examples to identify the components causing this issue, let me know. I have a few more examples (error messages) for bothDataParallel
andDistributedDataParallel
with different network architectures (containingCompressionModel
).2. Publish Python package
It would be much more useful if you can publish this framework as a Python package so that we can install it with
pip install compressai
Thank you!