Closed TengFeiHan0 closed 4 years ago
You can run outside docker if you install all dependencies. We only provide the docker container to make it easier to reproduce our results, since this is exactly the configuration we used in our experiments.
Just some personal thoughts (feel free to ignore), it's fine that you use horovod
and other fancy libraries to help accelerate training/simply the code, but I find it strange that these libraries are still required for inference. Is the usage of horovod
so un-user-friendly? In my opinion if their code is good then we must be able to run with only torch
and other minimum libraries..
I agree with removing horovod dependency for inference, will look into that as soon as possible. Thanks!
@VitorGuizilini have u been able to reproduce the docker environment on ubuntu 16.04? Also as mentioned by @kwea123 it's really a pain to work with horovod like libraries. It would be great if you could share more like a conda environment yml or a requirements file for conda/pip install?
@aakashshanbhag I use conda environment and pip install
everything when it pops a modulenotfounderror
. It works.
@kwea123 Thank you! I just wanted to know are you using Ubuntu 16.04 or 18.04? There are issues with python 3.6 compatibilities with a lot of modules for 16.04!
I use 18.04.
I use this file for nonconda env (Ubuntu 16.04), but you need to install https://github.com/TRI-ML/dgp manually. Troubles with horovod related to nccl need to be fixed. Successfully with single GPU trainning.
This reminds me that I also commented out the imports for dgp
in dataset related python scripts, as this library should not be required for inference. For training you need this yes.
This worked for me:
name: packnet-sfm
channels:
- conda-forge
- travis
- pytorch
dependencies:
- python=3.6
- numpy
- tqdm
- cudatoolkit=10.1
- pytorch=1.4.0
- torchvision=0.5.0
- yacs
- matplotlib
- termcolor
- opencv
- pip:
- wandb
- horovod
This worked for me:
name: packnet-sfm channels: - conda-forge - travis - pytorch dependencies: - python=3.6 - numpy - tqdm - cudatoolkit=10.1 - pytorch=1.4.0 - torchvision=0.5.0 - yacs - matplotlib - termcolor - opencv - pip: - wandb - horovod
Hello , I'm sorry to bother you. Could you please tell how to run the file without docker env after configuring environment .
@TheRustlessSummer I am not sure if I got your question, but maybe here you can find what you are looking for.
I use this file for nonconda env (Ubuntu 16.04), but you need to install https://github.com/TRI-ML/dgp manually. Troubles with horovod related to nccl need to be fixed. Successfully with single GPU trainning.
Hi, have you figured out multi-gpu training? Thanks!
It seems that this project only works under docker environments when I'm trying to evaluate your pre-trained models. how to test your project on an environment without docker?