facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.29k stars 2.5k forks source link

No way to install torchvision with pytorch-nightly - CUDA 10.0/Windows 10 #1054

Open tharindu-mathew opened 5 years ago

tharindu-mathew commented 5 years ago

🐛 Bug

No way to install torchvision with pytorch-nightly. Tried it with CUDA 10, Windows 10.

When I have just pytorch this issue [2] happens. When I install pytorch-nightly, and remove pytorch, torchvision also gets removed. I tried to get both versions together pytorch (to get torch vision) and pytorch nightly then [1] happens.

[1] I can't get the basic demo working. (maskrcnn) C:\Users\Tharindu\dev\maskrcnn-benchmark\demo>python webcam.py --min-image-size 800 Traceback (most recent call last): File "webcam.py", line 6, in from predictor import COCODemo File "C:\Users\Tharindu\dev\maskrcnn-benchmark\demo\predictor.py", line 6, in from maskrcnn_benchmark.modeling.detector import build_detection_model File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\detector__init.py", line 2, in from .detectors import build_detection_model File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\detector\detectors.py", line 2, in from .generalized_rcnn import GeneralizedRCNN File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\detector\generalized_rcnn.py", line 11, in from ..backbone import build_backbone File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\backbone__init__.py", line 2, in from .backbone import build_backbone File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\backbone\backbone.py", line 7, in from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\make_layers.py", line 10, in from maskrcnn_benchmark.layers import Conv2d File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\layers__init__.py", line 10, in from .nms import nms File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\layers\nms.py", line 5, in from apex import amp File "D:\Miniconda3\envs\maskrcnn\lib\site-packages\apex-0.1-py3.7-win-amd64.egg\apex\init.py", line 5, in from . import parallel File "D:\Miniconda3\envs\maskrcnn\lib\site-packages\apex-0.1-py3.7-win-amd64.egg\apex\parallel\init__.py", line 8, in ReduceOp = torch.distributed.deprecated.reduce_op AttributeError: module 'torch.distributed' has no attribute 'deprecated'

[2] python webcam.py --min-image-size 800 Traceback (most recent call last): File "webcam.py", line 6, in from predictor import COCODemo File "C:\Users\Tharindu\dev\maskrcnn-benchmark\demo\predictor.py", line 6, in from maskrcnn_benchmark.modeling.detector import build_detection_model File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\detector__init.py", line 2, in from .detectors import build_detection_model File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\detector\detectors.py", line 2, in from .generalized_rcnn import GeneralizedRCNN File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\detector\generalized_rcnn.py", line 11, in from ..backbone import build_backbone File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\backbone__init__.py", line 2, in from .backbone import build_backbone File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\backbone\backbone.py", line 7, in from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\modeling\make_layers.py", line 10, in from maskrcnn_benchmark.layers import Conv2d File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\layers\init__.py", line 10, in from .nms import nms File "c:\users\tharindu\dev\maskrcnn-benchmark\maskrcnn_benchmark\layers\nms.py", line 3, in from maskrcnn_benchmark import _C ImportError: DLL load failed: The specified procedure could not be found.

peterjc123 commented 5 years ago

Your issue [1] is not the problem on our side. Did you see that issue actually comes from the apex package? Could you please tell us that when issue [2] happens, which versions of torch or torchvision are installed? The versions and install commands are not given, which made the issue rather confusing.

tharindu-mathew commented 5 years ago

The reason I'm putting [1] is to show that this needs pytorch nightly. Hence it's looking for a weird deprecated package.

When I get pytorch nightly, it removed torchvision. I guess the solution here is to build torchvision nightly. Can I request you to add this scenario to your regression tests for pytorch nightly builds? This is a popular network to use to test for segmentation workloads.

peterjc123 commented 5 years ago

@tharindu-mathew

When I get pytorch nightly, it removed torchvision. I guess the solution here is to build torchvision nightly.

Sure, we are working on providing torchvision-nightly for Windows. BTW, could you please try whether conda install -c pytorch-nightly -c pytorch -c defaults pytorch torchvision solves your problem?

Can I request you to add this scenario to your regression tests for pytorch nightly builds?

It sounds a bit weird. Because this repo relies on PyTorch. I guess a better solution would be to add the tests in this repo.

peterjc123 commented 5 years ago

There are nightlies for torchvision under Windows. Could you please try them out?

tharindu-mathew commented 5 years ago

I believe it works more or less, although I still can't get it to work because now the deprecated is not available in the nightlies (it pulls 1.3.0dev).

On Thu, Sep 19, 2019 at 10:41 AM peterjc123 notifications@github.com wrote:

There are nightlies for torchvision under Windows. Could you please try them out?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/maskrcnn-benchmark/issues/1054?email_source=notifications&email_token=AANRDYKVPLESHHVIZV3B2D3QKOFR7A5CNFSM4IOAR6JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DWWZQ#issuecomment-533162854, or mute the thread https://github.com/notifications/unsubscribe-auth/AANRDYI56ASK47DJ472NILTQKOFR7ANCNFSM4IOAR6JA .

-- Regards,

Tharindu

blog: http://mackiemathew.com/

virtuozo007 commented 5 years ago

I am facing the same error with @tharindu-mathew . I have tried both the stable version(1.2) and nightly version(1.3.0dev) on Windwos10 with CUDA10.0. There is no problem with the pytorch installation, and the pytorch test is OK too. But when I run the demo python script, the error still occurs as follow: (maskrcnn) D:\1_Codes\Sources\maskrcnn-benchmark-master\demo>python webcam.py --min-image-size 800 Traceback (most recent call last): File "webcam.py", line 6, in from predictor import COCODemo File "D:\1_Codes\Sources\maskrcnn-benchmark-master\demo\predictor.py", line 6, in from maskrcnn_benchmark.modeling.detector import build_detection_model File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\modeling\detector__init__.py", line 2, in from .detectors import build_detection_model File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\modeling\detector\detectors.py", line 2, in from .generalized_rcnn import GeneralizedRCNN File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\modeling\detector\generalized_rcnn.py", line 11, in from ..backbone import build_backbone File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\modeling\backbone__init.py", line 2, in from .backbone import build_backbone File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\modeling\backbone\backbone.py", line 7, in from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\modeling\make_layers.py", line 10, in from maskrcnn_benchmark.layers import Conv2d File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\layers\init.py", line 10, in from .nms import nms File "d:\1_codes\sources\maskrcnn-benchmark-master\maskrcnn_benchmark\layers\nms.py", line 5, in from apex import amp File "C:\Users\Leo\Anaconda3\envs\maskrcnn\lib\site-packages\apex-0.1-py3.7-win-amd64.egg\apex\init.py", line 5, in from . import parallel File "C:\Users\Leo\Anaconda3\envs\maskrcnn\lib\site-packages\apex-0.1-py3.7-win-amd64.egg\apex\parallel\init__.py", line 8, in ReduceOp = torch.distributed.deprecated.reduce_op AttributeError: module 'torch.distributed' has no attribute 'deprecated'

peterjc123 commented 5 years ago

@virtuozo007 Please refer to https://github.com/NVIDIA/apex/issues/429.

virtuozo007 commented 5 years ago

@virtuozo007 Please refer to NVIDIA/ape

I donwngrade pytorch to 1.1.0 and the demo works. Thank you all the same.
It seems that the error occured with pytorch 1.2.0 or above is exactly because of the apex code, as discussed at https://github.com/NVIDIA/apex/issues/429. I'll try it later.