facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.26k stars 334 forks source link

Invalid syntax in run_distributed_engines.py #549

Open Mak-Ta-Reque opened 2 years ago

Mak-Ta-Reque commented 2 years ago

Instructions To Reproduce the 🐛 Bug:

  1. what changes you made (git diff) or what code you wrote

    
    diff --git a/dev/launch_slurm.sh b/dev/launch_slurm.sh
    index 193b09e..60b3e9d 100755
    --- a/dev/launch_slurm.sh
    +++ b/dev/launch_slurm.sh
    @@ -27,7 +27,7 @@ CFG=( "$@" )
    
    # create a temporary experiment folder to run the SLURM job in isolation
    RUN_ID=$(date +'%Y-%m-%d-%H-%M-%S')
    -EXP_ROOT_DIR="/checkpoint/$USER/vissl/$RUN_ID"
    +EXP_ROOT_DIR="/netscratch/kadir/slurm-training/checkpoint/$USER/vissl/$RUN_ID"
    CHECKPOINT_DIR=${CHECKPOINT_DIR:-"$EXP_ROOT_DIR/checkpoints/"}
    
    echo "EXP_ROOT_DIR: $EXP_ROOT_DIR"
2. what exact command you run:
cd $HOME/vissl && NODES=8   NUM_GPU=8   GPU_TYPE=V100   MEM=200g   CPU=8   EXPT_NAME=swav_100ep_rn50_in1k   OUTPUT_DIR=/tmp/swav/   PARTITION=learnfair   BRANCH=v0.1.6   NUM_DATA_WORKERS=4   MULTI_PROCESSING_METHOD=forkserver   ./dev/launch_slurm.sh   config=pretrain/swav/swav_8node_resnet config.OPTIMIZER.num_epochs=100 config.SLURM.USE_SLURM=true
4. what you observed (including __full logs__):

EXP_ROOT_DIR: /netscratch/kadir/slurm-training/checkpoint/kadir/vissl/2022-05-23-11-25-25 CHECKPOINT_DIR: /netscratch/kadir/slurm-training/checkpoint/kadir/vissl/2022-05-23-11-25-25/checkpoints/ File "/netscratch/kadir/slurm-training/checkpoint/kadir/vissl/2022-05-23-11-25-25/tools/run_distributed_engines.py", line 23 def hydra_main(overrides: List[Any]): ^ SyntaxError: invalid syntax

5. please simplify the steps as much as possible so they do not require additional resources to
   run, such as a private dataset.

## Expected behavior:

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

## Environment:

Provide your environment information using the following command:

wget -nc -q https://github.com/facebookresearch/vissl/raw/main/vissl/utils/collect_env.py && python collect_env.py


sys.platform linux Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] numpy 1.19.5 Pillow 8.4.0 vissl 0.1.6 @/home/kadir/vissl/vissl GPU available False torchvision 0.11.2+cu102 @/home/kadir/.local/lib/python3.6/site-packages/torchvision hydra 1.0.7 @/home/kadir/.local/lib/python3.6/site-packages/hydra apex unknown PyTorch 1.10.1+cu102 @/home/kadir/.local/lib/python3.6/site-packages/torch PyTorch debug build False


PyTorch built with:

CPU info:


Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian CPU(s) 24 On-line CPU(s) list 0-23 Thread(s) per core 2 Core(s) per socket 6 Socket(s) 2 NUMA node(s) 4 Vendor ID AuthenticAMD CPU family 21 Model 2 Model name AMD Opteron(tm) Processor 6348 Stepping 0 CPU MHz 2954.387 CPU max MHz 2800.0000 CPU min MHz 1400.0000 BogoMIPS 5599.95 Virtualization AMD-V L1d cache 16K L1i cache 64K L2 cache 2048K L3 cache 6144K NUMA node0 CPU(s) 0-5 NUMA node1 CPU(s) 6-11 NUMA node2 CPU(s) 12-17 NUMA node3 CPU(s) 18-23




## When to expect Triage
1
VISSL devs and contributors aim to triage issues asap however, as a general guideline, we ask users to expect triaging in 1-2 weeks.
QuentinDuval commented 2 years ago

Hi @Mak-Ta-Reque,

Thank you for using VISSL and reaching to us :)

So at first sight, this error is super low level and weird: it seems like the Python interpreter is not parsing the file correctly. Could you check that running the file with python (without using the launch_slurm.sh) is able to run the file? If not what do you observe?

Thank you, Quentin