Closed cyang31 closed 3 years ago
Hi, For your command can you run it with this command to see if it solves your issue:
singularity.sif CUDA_VISIBLE_DEVICES=0 mmf_run config=/scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml model=visual_bert dataset=hateful_memes env.data_dir=/scratch/UserName/hateful_meme/data training.num_workers=0
You don't need training.fast_read
, it is for something else. Specifically, note CUDA_VISIBLE_DEVICES=0
to run it on single GPU and training.num_workers=0
to run it with only one dataset worker.
Hi, I tried your suggested command, but it still gets stuck at the same place. If it is useful, I can smoothly launch the code only for baseline Image-Grid. I find that the similar issue always happens when it tries to unpack things, either extras.tar.gz or features.tar.gz. Those files can be automatically downloaded and during unpacking, the size of resulted files keeps growing then becomes stable at some point but the main code gets stuck at "unpacking X.tar.gz" and doesn't change.
Something must be off in singularity because this works fine as it is. Can you try running the command outside of singularity?
Instructions To Reproduce the Issue:
Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:
git diff
) I didn't change the code.Expected behavior:
No error pops out, but it takes forever to unpack the features.tar.gz file, which is unexpected. I tried to manually download and unpack it locally in order to check whether it is the problem of slow unpacking and it turned out not. However, when I rerun the above code after that, it actually bypassed the downloading stage but get stuck again at "mmf.trainers.mmf_trainer: Loading datasets". I waited overnight to make sure it is not just too slow, but nothing changed.
Environment:
WARNING: underlay of /usr/bin/nvidia-debugdump required more than 50 (375) bind mounts Collecting environment information... PyTorch version: 1.6.0+cu101 Is debug build: No CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 CMake version: Could not collect
Python version: 3.7 Is CUDA available: Yes CUDA runtime version: Could not collect GPU models and configuration: GPU 0: Tesla K20Xm Nvidia driver version: 418.39 cuDNN version: Could not collect
Versions of relevant libraries: [pip3] numpy==1.19.4 [pip3] torch==1.6.0+cu101 [pip3] torchtext==0.5.0 [pip3] torchvision==0.7.0+cu101 [conda] Could not collect