NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

MDB_PAGE_NOTFOUND Error while creating db #1460

Open hahne opened 7 years ago

hahne commented 7 years ago

Hello, I haved moved my job folder and everything was working fine but now I wanted to create a new database and I get the following error:

2017-02-16 12:57:03 [DEBUG] 200358 total lines in file 2017-02-16 12:57:03 [INFO ] 200358 valid lines in file 2017-02-16 12:57:03 [DEBUG] Category 0 has 100179 images. 2017-02-16 12:57:03 [DEBUG] Category 1 has 100179 images. 2017-02-16 12:57:05 [DEBUG] Processed 100/200358 2017-02-16 12:57:07 [DEBUG] Processed 200/200358 2017-02-16 12:57:09 [DEBUG] Processed 300/200358 2017-02-16 12:57:11 [DEBUG] Processed 400/200358 2017-02-16 12:57:13 [DEBUG] Processed 500/200358 2017-02-16 12:57:13 [ERROR] PageNotFoundError: mdb_put: MDB_PAGE_NOTFOUND: Requested page not found Traceback (most recent call last): File "/home/johannes/nvidia/digits/digits/tools/create_db.py", line 792, in hdf5_dset_limit=args['hdf5_dset_limit'], File "/home/johannes/nvidia/digits/digits/tools/create_db.py", line 294, in create_db mean_files, **kwargs) File "/home/johannes/nvidia/digits/digits/tools/create_db.py", line 358, in _create_lmdb _write_batch_lmdb(db, batch, images_written) File "/home/johannes/nvidia/digits/digits/tools/create_db.py", line 665, in _write_batch_lmdb lmdb_txn.put(key, datum.SerializeToString()) lmdb.PageNotFoundError: mdb_put: MDB_PAGE_NOTFOUND: Requested page not found

Not sure if it is related anyhow to the job folder. Can you help me with this?

My DIGITS version is 5.1-dev

lukeyeager commented 7 years ago

I don't think I've ever seen that error before. https://github.com/BVLC/caffe/pull/3731 seems to be working well for everyone, and we're basically doing the same thing in DIGITS.

How did you install LMDB? You didn't install from source or anything weird like that?

$ dpkg -l | grep lmdb
ii  liblmdb-dev:amd64                           0.9.17-3                                      amd64        Lightning Memory-Mapped Database development files
ii  liblmdb0:amd64                              0.9.17-3                                      amd64        Lightning Memory-Mapped Database shared library
ii  python-lmdb                                 0.87-2                                        amd64        Lightning Memory-Mapped Database python bindings

$ pip list | grep -i lmdb
lmdb (0.87)
hahne commented 7 years ago

Thank you for your help!

No I did not install LMDB from source or something. The thing is that it was working before but it is a few weeks ago that a created a new dataset.

I get the same output like you executing these commands.

hahne commented 7 years ago

Ok I was running create_db.py now manually and changed the output path to a local path. Here it is working.

Due to memory issues, I moved the job folder to a network drive. The network drive is a windows share mounted with cifs. It has a normal file path (/home/username/networks/data). But apparently this is causing issues for lmdb...

lukeyeager commented 7 years ago

Technically LMDB shouldn't ever be used over NFS (http://stackoverflow.com/a/24696604/2404152). But I've done it a lot and don't often see any issues.

hahne commented 7 years ago

You think it will work if I read only the database?

My quick solution for now is: I changed create_db.py that it will write the database temporarily on the local disk and moves it then to the job folder.