How to run linear evaluation on VOC07? I am getting errors trying to run it.

yxchng commented 3 years ago

I want to run linear evaluation on VOC07 using this script https://github.com/facebookresearch/vissl/blob/main/configs/config/benchmark/linear_image_classification/voc07/eval_alexnet_8gpu_transfer_voc07_svm.yaml. However, it is giving me errors.

Instructions To Reproduce the Issue:

run using the command

python3 run_distributed_engines.py \
hydra.verbose=true \
config=eval_resnet_8gpu_transfer_voc07_svm \
config.CHECKPOINT.DIR="./checkpoints_voc" \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE="./new_model.pth.tar" \
config.MODEL.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=""

full logs you observed:


Traceback (most recent call last):
File "run_distributed_engines.py", line 194, in <module>
hydra_main(overrides=overrides)
File "run_distributed_engines.py", line 179, in hydra_main
hook_generator=default_hook_generator,
File "run_distributed_engines.py", line 112, in launch_distributed
daemon=False,
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 2 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/data00/yarn/nmdata/usercache/zhoudongyan.daniel/appcache/application_1592202091440_0014/container_e08_1592202091440_0014_10_005042/vissl/run_distributed_engines.py", line 166, in _distributed_worker process_main(cfg, dist_run_id, local_rank=local_rank, node_id=node_id) File "/data00/yarn/nmdata/usercache/zhoudongyan.daniel/appcache/application_1592202091440_0014/container_e08_1592202091440_0014_10_005042/vissl/run_distributed_engines.py", line 159, in process_main hook_generator=hook_generator, File "/home/xxx/.local/lib/python3.7/site-packages/vissl/engines/train.py", line 102, in train_main trainer.train() File "/home/xxx/.local/lib/python3.7/site-packages/vissl/trainer/trainer_main.py", line 186, in train task = train_step_fn(task) File "/home/xxx/.local/lib/python3.7/site-packages/vissl/trainer/train_steps/standard_train_step.py", line 154, in standard_train_step local_loss = task.loss(model_output, target) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py", line 962, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2468, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2264, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: multi-target not supported at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:15


## Expected behavior:

Run without error

## Environment:

sys.platform linux Python 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0] numpy 1.19.5 Pillow 8.2.0 vissl 0.1.5 @/home/xxx/.local/lib/python3.7/site-packages/vissl GPU available True GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB CUDA_HOME /usr/local/cuda torchvision 0.8.2 @/usr/local/lib/python3.7/dist-packages/torchvision hydra 1.0.7 @/home/xxx/.local/lib/python3.7/site-packages/hydra classy_vision 0.6.0.dev @/home/xxx/.local/lib/python3.7/site-packages/classy_vision tensorboard 1.15.0 apex 0.1 @/usr/local/lib/python3.7/dist-packages/apex cv2 3.2.0 PyTorch 1.7.1 @/usr/local/lib/python3.7/dist-packages/torch PyTorch debug build False

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

CPU info:

Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian Address sizes 46 bits physical, 48 bits virtual CPU(s) 96 On-line CPU(s) list 0-95 Thread(s) per core 2 Core(s) per socket 24 Socket(s) 2 NUMA node(s) 2 Vendor ID GenuineIntel CPU family 6 Model 85 Model name Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Stepping 7 CPU MHz 3099.992 CPU max MHz 3900.0000 CPU min MHz 1000.0000 BogoMIPS 4800.00 Virtualization VT-x L1d cache 32K L1i cache 32K L2 cache 1024K L3 cache 36608K NUMA node0 CPU(s) 0-23,48-71 NUMA node1 CPU(s) 24-47,72-95

prigoyal commented 3 years ago

Hi @yxchng , thank you for reaching out. On VOC07, please use the https://github.com/facebookresearch/vissl/blob/main/tools/train_svm.py instead of run_distributed_engines.py. We also provide documentation on this benchmark here https://vissl.readthedocs.io/en/latest/flowcharts/svm_workflow.html. Hope this helps! :)

yxchng commented 3 years ago

OK. Thanks. It works now.

facebookresearch / vissl

How to run linear evaluation on VOC07? I am getting errors trying to run it. #433

Instructions To Reproduce the Issue: