NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

Error when adding dice.py in medical example. #1440

Closed mjohn123 closed 7 years ago

mjohn123 commented 7 years ago

Hello all, I am really thankful for the DIGITS tool. It is really good

I am using dice.py in the medical example . I got an error when adding the dice.py as follows

layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "score"
  bottom: "label"
  top: "accuracy"
  include { stage: "val" }
  accuracy_param { ignore_label: 255 }
}
# Dice coefficient
layer {
    type: 'Python'
    name: 'dice'
    bottom: 'score'
    bottom: 'label'
    top: 'dice'
    python_param {
      module: "digits_python_layers"
      layer: "Dice"
    }
    exclude { stage: "deploy" }
}

When I run it, I got the error is

ERROR: Check failed: registry.count(type) == 0 (1 vs. 0) Layer type Convolution already registered.

Creating layer loss
Setting up loss
Top shape: (1)
with loss weight 1
Memory required for data: 38878868
Creating layer dice
Check failed: registry.count(type) == 0 (1 vs. 0) Layer type Convolution already registered

I build the caffe with python enable. How could I solve it? Thank all.

lukeyeager commented 7 years ago

You're using this python layer file? https://github.com/NVIDIA/DIGITS/tree/digits-5.0/examples/medical-imaging#dice-metric

mjohn123 commented 7 years ago

Yes, i am using this example

lukeyeager commented 7 years ago

Where did you get Caffe from? Deb package? Docker image? Source build?

mjohn123 commented 7 years ago

I build it from open source. My caffe package located in /home/john/caffe and compile by

mkdir build cmake .. make all -j8 && make pycaffe

(I uncommented WITH_PYTHON_LAYER := 1 in the Makefile.config)

Note that, first I installed DIGITS from .deb package (version 4) and DIGITS located in /usr/share/digits However, I saw the package is not newest. Hence, I installed DIGITS from source code (current is 5.1). Hence, in my computer has two version of DIGITS (once from .deb and once from source, I do not know how to uninstall the old package (from .deb))

Update: I may be figured out the issue. Although I installed caffe from source, but DIGITS does not link my caffe installed. I checked it by making clean my caffe but the DIGITS still worked. I think the issue is from default caffe package when I install .deb package

CUDA_REPO_PKG=cuda-repo-ubuntu1404_7.5-18_amd64.deb &&
    wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/$CUDA_REPO_PKG &&
    sudo dpkg -i $CUDA_REPO_PKG

ML_REPO_PKG=nvidia-machine-learning-repo-ubuntu1404_4.0-2_amd64.deb &&
    wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64/$ML_REPO_PKG &&
    sudo dpkg -i $ML_REPO_PKG
lukeyeager commented 7 years ago
  1. When you're using CMake the Makefile.config file is ignored. You have to use CMake flags like -DCPU_ONLY=Off to re-configure. But Python layer support is turned on by default in the CMake build, so that's probably not your problem.

  2. Use the CAFFE_ROOT environment variable to indicate which build of Caffe you want to use: https://github.com/NVIDIA/DIGITS/blob/digits-5.0/docs/Configuration.md

  3. Check the top right corner of the DIGITS UI to check your Caffe build version.

mjohn123 commented 7 years ago

Thank @lukeyeager . My caffe version in DIGITS is

Caffe version: 
1.0.0-rc3
Caffe flavor: 
BVLC

I think it got from .deb when I installed from .deb package at the first time. Currently, I have two versions of DIGITS in my computer. Do you know how can I uninstall DIGITS which installed from .deb package as the link https://github.com/NVIDIA/DIGITS/blob/master/docs/UbuntuInstall.md

mjohn123 commented 7 years ago

Finally, I fixed it. Thanks @lukeyeager for your help I write my solution which may be useful to another people. Although it can work, but the caffe version in top right of DIGITS UI is still 1.0.0-rc3, while my caffe version is caffe version 0.15.13. Why is it? This is step by step which I used to fix my error

  1. You have to set the part CAFFE_ROOT by
    sudo gedit ~/.bashrc
    Add the line `export CAFFE_ROOT=/home/john/caffe` where   `/home/john/caffe` is my caffe folder
  2. Use the commands to rebuild caffe
    cat $CAFFE_ROOT/python/requirements.txt | xargs -n1 sudo pip install
    cd $CAFFE_ROOT
    mkdir build
    cd build
    cmake ..
    make --jobs=4
  3. You run the digit as CAFFE_ROOT=/home/john/caffe ./digits-devserver --port 5001

In browser, you typing the localhost:5001 and you can use python caffe

I only maintained the issue how could I uninstall the old DIGITS which installed from .deb package. It located in /usr/share/digits. Thanks

lukeyeager commented 7 years ago

To uninstall any deb package:

apt-get uninstall <package>

After you uninstall digits, caffe will still be on your system until you type apt-get autoremove. Beware that this may autoremove other packages that you didn't intend to remove like old kernel versions.

Probably the end-all be-all command you want is:

sudo apt-get autoremove --purge digits

None of this is specific to DIGITS - this is just debian+ubuntu's package management system.

mjohn123 commented 7 years ago

Thanks @lukeyeager. I uninstalled the digits successfully. If I have more than one version of caffe. Could I use these version for different port? Such as

CAFFE_ROOT=/home/john/caffev1 ./digits-devserver --port 5001
CAFFE_ROOT=/home/john/caffev2 ./digits-devserver --port 5002

And from browser, I can run different version of caffe in two tab.

lukeyeager commented 7 years ago

Yes, that's a supported configuration. But you'll probably want to set a unique DIGITS_JOBS_DIR for each server, too. Otherwise each server may get confused about which jobs it owns. You could have each server constantly rewriting metadata back-and-forth for the same job[s].