Closed yxchng closed 3 years ago
Hi @yxchng , thank you for reaching out. On VOC07, please use the https://github.com/facebookresearch/vissl/blob/main/tools/train_svm.py instead of run_distributed_engines.py
. We also provide documentation on this benchmark here https://vissl.readthedocs.io/en/latest/flowcharts/svm_workflow.html. Hope this helps! :)
OK. Thanks. It works now.
I want to run linear evaluation on VOC07 using this script https://github.com/facebookresearch/vissl/blob/main/configs/config/benchmark/linear_image_classification/voc07/eval_alexnet_8gpu_transfer_voc07_svm.yaml. However, it is giving me errors.
Instructions To Reproduce the Issue:
run using the command
full logs you observed:
-- Process 2 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/data00/yarn/nmdata/usercache/zhoudongyan.daniel/appcache/application_1592202091440_0014/container_e08_1592202091440_0014_10_005042/vissl/run_distributed_engines.py", line 166, in _distributed_worker process_main(cfg, dist_run_id, local_rank=local_rank, node_id=node_id) File "/data00/yarn/nmdata/usercache/zhoudongyan.daniel/appcache/application_1592202091440_0014/container_e08_1592202091440_0014_10_005042/vissl/run_distributed_engines.py", line 159, in process_main hook_generator=hook_generator, File "/home/xxx/.local/lib/python3.7/site-packages/vissl/engines/train.py", line 102, in train_main trainer.train() File "/home/xxx/.local/lib/python3.7/site-packages/vissl/trainer/trainer_main.py", line 186, in train task = train_step_fn(task) File "/home/xxx/.local/lib/python3.7/site-packages/vissl/trainer/train_steps/standard_train_step.py", line 154, in standard_train_step local_loss = task.loss(model_output, target) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py", line 962, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2468, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2264, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: multi-target not supported at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:15
sys.platform linux Python 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0] numpy 1.19.5 Pillow 8.2.0 vissl 0.1.5 @/home/xxx/.local/lib/python3.7/site-packages/vissl GPU available True GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB CUDA_HOME /usr/local/cuda torchvision 0.8.2 @/usr/local/lib/python3.7/dist-packages/torchvision hydra 1.0.7 @/home/xxx/.local/lib/python3.7/site-packages/hydra classy_vision 0.6.0.dev @/home/xxx/.local/lib/python3.7/site-packages/classy_vision tensorboard 1.15.0 apex 0.1 @/usr/local/lib/python3.7/dist-packages/apex cv2 3.2.0 PyTorch 1.7.1 @/usr/local/lib/python3.7/dist-packages/torch PyTorch debug build False
PyTorch built with:
CPU info:
Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian Address sizes 46 bits physical, 48 bits virtual CPU(s) 96 On-line CPU(s) list 0-95 Thread(s) per core 2 Core(s) per socket 24 Socket(s) 2 NUMA node(s) 2 Vendor ID GenuineIntel CPU family 6 Model 85 Model name Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Stepping 7 CPU MHz 3099.992 CPU max MHz 3900.0000 CPU min MHz 1000.0000 BogoMIPS 4800.00 Virtualization VT-x L1d cache 32K L1i cache 32K L2 cache 1024K L3 cache 36608K NUMA node0 CPU(s) 0-23,48-71 NUMA node1 CPU(s) 24-47,72-95