NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

Mean image missing after changing DIGITS_JOBS_DIR #1643

Open lmeinel opened 7 years ago

lmeinel commented 7 years ago

Hi,

due to limited space on my primary OS partition I wanted to move the /var/lib/digits/ directory (jobs) to another internal drive.

I followed the information given by: https://github.com/NVIDIA/DIGITS/blob/master/docs/Configuration.md and https://github.com/NVIDIA/DIGITS/blob/digits-5.0/docs/UbuntuInstall.md#troubleshooting

Hence, I stopped the DIGITS server: sudo systemctl stop digits

Then changed the DIGITS_JOBS_DIR environment variable to the new location /mnt/linux_data/digits/: sudo nano /lib/systemd/system/digits.service

# DIGITS systemd service
[Unit]
Description=DIGITS server
After=local-fs.target network.target

[Service]
User=www-data
#Environment="DIGITS_JOBS_DIR=/var/lib/digits/jobs"
Environment="DIGITS_JOBS_DIR=/mnt/linux_data/digits/jobs"
Environment="DIGITS_LOGFILE_FILENAME=/var/log/digits/digits.log"
ExecStart=/usr/bin/python -m digits -p 34448
Restart=on-failure
ExecStop=/bin/kill -INT $MAINPID

[Install]
WantedBy=multi-user.target

Then I restarted DIGITS: sudo systemctl daemon-reload && sudo systemctl restart digits

DIGITS starts fine and finds the old jobs. journalctl -u digits.service shows

Mai 18 13:12:00 dst-aux-ws-01 systemd[1]: Started DIGITS server.
Mai 18 13:12:01 dst-aux-ws-01 python[8120]:   ___ ___ ___ ___ _____ ___
Mai 18 13:12:01 dst-aux-ws-01 python[8120]:  |   \_ _/ __|_ _|_   _/ __|
Mai 18 13:12:01 dst-aux-ws-01 python[8120]:  | |) | | (_ || |  | | \__ \
Mai 18 13:12:01 dst-aux-ws-01 python[8120]:  |___/___\___|___| |_| |___/ 5.0.0
Mai 18 13:12:01 dst-aux-ws-01 python[8120]: 2017-05-18 13:12:01 [INFO ] Loaded 18 jobs.

Also the web interface shows the new job directory correctly, but the mean image is missing:

image

less /var/log/nginx/error.log shows: 2017/05/18 11:52:29 [error] 1220#1220: *518 open() "/var/lib/digits/jobs/20170511-182940-ad68/mean.jpg" failed (2: No such file or directory), client: XXX.XXX.XXX.XXX, server: , request: "GET /files/20170511-182940-ad68/mean.jpg HTTP/1.1", host: "XXX.XXX.de", referrer: "http://XXX.XXX.de/datasets/20170511-182940-ad68"

There is a simple workaround. When I create a symlink pointing to the new location sudo ln -s /mnt/linux_data/digits/ /var/lib/digits it works fine. Still I assume this to be a bug.

I am using Ubuntu 16.04 (KDE neon). I also tried to change the DIGITS_JOBS_DIR in /etc/init/digits.conf which is also present. But nothing changed.

Thanks! Lars

lukeyeager commented 7 years ago

Thanks for the detailed report! We're probably storing the path to the mean file as an absolute path - instead of relative to DIGITS_JOBS_DIR like we typically do.