chainer / chainercv

ChainerCV: a Library for Deep Learning in Computer Vision
MIT License
1.48k stars 304 forks source link

Small demo of training based on own data #391

Closed pksorensen closed 6 years ago

pksorensen commented 7 years ago

Hello

First of, i really appreciate how simple it was to get started with the info in the the README. My background is that I understand the principals of deep networks from theory and have not done anything real with it.

So coming here and with less than one hour work I was able to (on windows) to set up a docker container running chainercv + running a faster rcnn detection on a sample image, was extremely satisfying.

Would it not be possible to author the same experience for a simple training of a small getting started tutoring with own data?

Lets say a guy have 10 images and he takes the time to mask them, put them in a folder structure and then run a simple demo.py script that trains on 9 of them and do a detect on the last. I know this is will yield poor results, but these kind of demos really accelerate the adoption because its something everyone can do and it yield much more value when people can do it on their own use cases instead always the same sample data sets that always works.

if anyone can contribute with a folder structure and demo script to train, i can write up the tutorial and documentation to help peolpe

yuyu2172 commented 7 years ago

Thanks for using our library!

Actually, I have developed something similar to what you have suggested. https://github.com/yuyu2172/image-labelling-tool/tree/master/examples/ssd#example

This example contains a script to train SSD (another object detector) on a dataset annotated by an annotation tool. Afterwards, it should be straight forward to train with your own annotated dataset (just swap the dataset path).

pksorensen commented 7 years ago

Cool, I will give it a try to see how easy it is. Also, i am guessing if it works easy, it would not be that hard to swap it for any of the other examples of chainercv. SegNet and Faster RCNN. Would love to see MarkRCNN here too ;)

HIN0209 commented 7 years ago

Hello. I tried the sample apple-orange train.py. (ubuntu 14.04, chainer 2.02, chainercv 0.6.0, anaconda3, python 3.5.3) Actually, there was nothing happening forever, even after reducing the --val_iteration to 1. I gave up and hitting Ctrl C showed the following comments: Thanks!

Traceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-3:
Traceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
Process Process-8:
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
^CTraceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    entry.extension(self)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/training/extensions/_snapshot.py", line 39, in snapshot_object
    _snapshot_object(trainer, target, filename.format(trainer), savefun)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/training/extensions/_snapshot.py", line 86, in _snapshot_object
    savefun(tmppath, target)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/serializers/npz.py", line 70, in save_npz
    numpy.savez_compressed(f, **s.target)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/numpy/lib/npyio.py", line 657, in savez_compressed
    _savez(file, args, kwds, True)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/numpy/lib/npyio.py", line 706, in _savez
    zipf.write(tmpfile, arcname=fname)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/zipfile.py", line 1499, in write
    buf = cmpr.compress(buf)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 70, in <module>
    args.resume)
  File "/home/owner/chainer/labelling_tool/examples/ssd/train_utils.py", line 167, in train
    trainer.run()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/training/trainer.py", line 305, in run
    self.updater.finalize()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/training/updater.py", line 178, in finalize
    iterator.finalize()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 105, in finalize
    worker.join()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/process.py", line 121, in join
    res = self._popen.wait(timeout)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/popen_fork.py", line 51, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Exception ignored in: <bound method ZipFile.__del__ of <zipfile.ZipFile [closed]>>
Traceback (most recent call last):
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/zipfile.py", line 1597, in __del__
    self.close()
  File "/home/owner/anaconda3/envs/chainercv/lib/python3.5/zipfile.py", line 1609, in close
    self.fp.seek(self.start_dir)
ValueError: seek of closed file
pksorensen commented 7 years ago

I also ran into issues:

PS C:\dev\chainer\meter> docker run --rm -it -v C:/dev/chainer/meter/data/apple_orange_annotations/apple_orange_annotations:/data image-labelling-tool /bin/bash
chainer@74b47c9ba1a5:/src/image-labelling-tool/examples/ssd$ python train.py --train /data --label_names /data/apple_orange_label_names.yml --val_iteration 100
Downloading ...
From: https://github.com/yuyu2172/share-weights/releases/download/0.0.3/ssd300_voc0712_2017_06_06.npz
To: /home/chainer/.chainer/dataset/_dl_cache/39be0d7ab69efbab2387ccbf556b7e8d
  %   Total    Recv       Speed  Time left
100   93MiB   93MiB   8281KiB/s    0:00:00Killed
chainer@74b47c9ba1a5:/src/image-labelling-tool/examples/ssd$ python train.py --train /data --label_names /data/apple_orange_label_names.yml --val_iteration 100
^C^CTraceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/trainer.py", line 296, in run
    update()
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/updater.py", line 234, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/opt/conda/lib/python3.6/site-packages/chainer/optimizer.py", line 527, in update
    loss = lossfun(*args, **kwds)
  File "/src/image-labelling-tool/examples/ssd/train_utils.py", line 33, in __call__
    mb_locs, mb_confs = self.model(imgs)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd.py", line 130, in __call__
    return self.multibox(self.extractor(x))
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd_vgg16.py", line 153, in __call__
    ys = super(VGG16Extractor300, self).__call__(x)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd_vgg16.py", line 82, in __call__
    h = F.relu(self.conv2_2(h))
  File "/opt/conda/lib/python3.6/site-packages/chainer/links/connection/convolution_2d.py", line 154, in __call__
    x, self.W, self.b, self.stride, self.pad)
  File "/opt/conda/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 439, in convolution_2d
    return func(x, W, b)
  File "/opt/conda/lib/python3.6/site-packages/chainer/function.py", line 200, in __call__
    outputs = self.forward(in_data)
  File "/opt/conda/lib/python3.6/site-packages/chainer/function.py", line 329, in forward
    return self.forward_cpu(inputs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 89, in forward_cpu
    self.col, W, ((1, 2, 3), (1, 2, 3))).astype(x.dtype, copy=False)
  File "/opt/conda/lib/python3.6/site-packages/numpy/core/numeric.py", line 1337, in tensordot
    at = a.transpose(newaxes_a).reshape(newshape_a)
^C

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 70, in <module>
    args.resume)
  File "/src/image-labelling-tool/examples/ssd/train_utils.py", line 167, in train
    trainer.run()
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/trainer.py", line 305, in run
    self.updater.finalize()
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/updater.py", line 178, in finalize
    iterator.finalize()
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 105, in finalize
    worker.join()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 121, in join
    res = self._popen.wait(timeout)
  File "/opt/conda/lib/python3.6/multiprocessing/popen_fork.py", line 51, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/opt/conda/lib/python3.6/multiprocessing/popen_fork.py", line 29, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
chainer@74b47c9ba1a5:/src/image-labelling-tool/examples/ssd$
pksorensen commented 7 years ago

This is the setup i used

####################Python 3#########################
ARG python_version=3.5.2
RUN conda install -y python=${python_version} && \
     pip install -U numpy  && \
     pip install chainercv && \
     pip install opencv-python && \
     pip install pyyaml && \
     conda clean -yt
pksorensen commented 7 years ago

My best guest is that its due to out of memory in my case

HIN0209 commented 7 years ago

Thank you for commenting. (1) For the installing, no mention of chainer itself for its installing order. Also, cupy was requested to install. (2) After installing the same setting plus chainer and cupy, I again failed to have any response. My additional setting includes NVIDIA GTX 1080Ti and 32G RAM, such that memory should not be the issue.

Hakuyume commented 7 years ago

If nothing happens in training, please check #386. https://github.com/chainer/chainercv/issues/386#issuecomment-321485827

pksorensen commented 7 years ago

I also got it running now that I got a machine with more RAM. But same experience as above that it dont look like nothing happen.

Also, as being new to chainer. What do one do after (assuming i get it working) trainning? is the new model file stored somewhere so you can use it for prediction?

HIN0209 commented 7 years ago

So, the possible reasons for nothing happening to me are the following? 1: RAM is too small. Reducing the batchsize from 8 to 2 did not improve. 2: opencv issue. uninstalling opencv-python and installing cv3 by "conda install -c https://conda.anaconda.org/menpo opencv3" did not change it 3: MultiprocessIterator has a bug

pksorensen commented 7 years ago

I solved the RAM issue, it needs around 8GB. I tried compiling opencv 3.3 myself, and did not solve it.

pksorensen commented 7 years ago

Heres the output when interupting it, if it helps anyone.

chainer@5a84acfb0706:/src/image-labelling-tool/examples/ssd$ python train.py --train /data --label_names /data/apple_orange_label_names.yml --out /data/logs --val_iteration 1
^CProcess Process-2:
Process Process-9:
Process Process-1:
Process Process-5:
Process Process-3:
Process Process-4:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
KeyboardInterrupt
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-10:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-11:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-12:
Process Process-13:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-14:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-17:
Process Process-20:
Process Process-19:
Process Process-18:
Process Process-15:
Process Process-16:
Process Process-8:
Process Process-6:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
Process Process-7:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
KeyboardInterrupt
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 386, in _worker
    cnt, mem_index, index = in_queue.get()
  File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 93, in get
    with self._rlock:
  File "/opt/conda/lib/python3.6/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "train.py", line 70, in <module>
    args.resume)
  File "/src/image-labelling-tool/examples/ssd/train_utils.py", line 167, in train
    trainer.run()
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/trainer.py", line 299, in run
    entry.extension(self)
  File "/opt/conda/lib/python3.6/site-packages/chainer/training/extensions/evaluator.py", line 137, in __call__
    result = self.evaluate()
  File "/opt/conda/lib/python3.6/site-packages/chainercv/extensions/evaluator/detection_voc_evaluator.py", line 76, in evaluate
    target.predict, it)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/utils/iterator/apply_prediction_to_iterator.py", line 93, in apply_prediction_to_iterator
    _apply(predict, iterator, hook))
  File "/opt/conda/lib/python3.6/site-packages/chainercv/utils/iterator/unzip.py", line 87, in unzip
    values = next(iterator)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/utils/iterator/apply_prediction_to_iterator.py", line 129, in _apply
    pred_values = predict(imgs)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd.py", line 204, in predict
    mb_locs, mb_confs = self(x)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd.py", line 130, in __call__
    return self.multibox(self.extractor(x))
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd_vgg16.py", line 153, in __call__
    ys = super(VGG16Extractor300, self).__call__(x)
  File "/opt/conda/lib/python3.6/site-packages/chainercv/links/model/ssd/ssd_vgg16.py", line 91, in __call__
    h = F.relu(self.conv4_2(h))
  File "/opt/conda/lib/python3.6/site-packages/chainer/links/connection/convolution_2d.py", line 154, in __call__
    x, self.W, self.b, self.stride, self.pad)
  File "/opt/conda/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 439, in convolution_2d
    return func(x, W, b)
  File "/opt/conda/lib/python3.6/site-packages/chainer/function.py", line 200, in __call__
    outputs = self.forward(in_data)
  File "/opt/conda/lib/python3.6/site-packages/chainer/function.py", line 329, in forward
    return self.forward_cpu(inputs)
  File "/opt/conda/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 89, in forward_cpu
    self.col, W, ((1, 2, 3), (1, 2, 3))).astype(x.dtype, copy=False)
  File "/opt/conda/lib/python3.6/site-packages/numpy/core/numeric.py", line 1337, in tensordot
    at = a.transpose(newaxes_a).reshape(newshape_a)
KeyboardInterrupt
yuyu2172 commented 7 years ago

@HIN0209 @pksorensen

Thanks for trying my script. I looked at your command line outputs, and it seems that the script is working as expected. It does not produce any outputs because it has not reached a checkpoint yet. The default option is optimized for a setting where there is GPU. It would be difficult to reach the checkpoint with CPU in a reasonable time. I added an option --log_iteration to the script. This will set the number of iterations until the log is reported. Please try it again with --log_iteration 1 and --batchsize 1 options.

@HIN0209 By looking at your output, the script is saving a model to a file. By setting --val_iteration to 1, the model is saved every iteration.

pksorensen commented 7 years ago

Cool, thanks for updating us @yuyu2172

What should we expect in training time? on a GPU?

I dont have any nvidia graphic cards. (asumming it dont work with radeon?) But I do have 24 CPU cores.

HIN0209 commented 7 years ago

I used the updated codes and it is working! Thank you for the quick work. I will finish training to do the rest (demo, etc). FYI: 11.2 iters/sec training time with NVIDIA GTX 1080Ti on the provided apple-orange set (batchsize=1)

"model_iter_400" would be nice to be provided in the package for other people who just want the demo.

Also, a nice document like this tensorflow object-detection API would be really appreciated. https://medium.com/towards-data-science/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9

pksorensen commented 7 years ago

If @yuyu2172 is not much for writing, i can write up a post about this whenever I have gathered enough demo matrial :)

pksorensen commented 7 years ago

@yuyu2172 It works for --batchsize 1, and this means there is a bug in the multiprocess thing.

I tried building opencv with TPP instead as sugested in the other issues, but its all the same. if anyone comes to find out why it fails with multi processes, it would be nice

chainer@1f96ee0f2adc:/src/image-labelling-tool/examples/ssd$ python train.py --train /data --label_names /data/apple_orange_label_names.yml --out /data/logs --val_iteration 100 --log_iteration 1 --batchsize 2 epoch iteration lr main/loss main/loss/loc main/loss/conf validation/main/map 0 1 0.0001 14.9106 2.15745 12.7531 0 2 0.0001 11.9335 1.71874 10.2148 0 3 0.0001 10.9854 1.35317 9.63226 0 4 0.0001 10.7581 1.18375 9.57433 0 5 0.0001 10.1219 1.07298 9.04888 0 6 0.0001 10.6807 1.71468 8.966 0 7 0.0001 10.4193 1.95761 8.46173

It works

yuyu2172 commented 7 years ago

Thanks guys for nice feedbacks :)

"model_iter_400" would be nice to be provided in the package for other people who just want the demo.

OK. I will work on it.

What should we expect in training time? on a GPU?

The speed HIN0209 posted is reasonable. For training a detector on more complex dataset like VOC dataset, it takes about a day and a half to train with Titan X. I am talking about the script here https://github.com/chainer/chainercv/tree/master/examples/ssd. It is important to note that the data augmentation used in these scripts are quite intensive.

If @yuyu2172 is not much for writing, i can write up a post about this whenever I have gathered enough demo matrial :)

Big +1 for this! The current README is just a raw list of features and commands. It would be great to have a write up post.

pksorensen commented 7 years ago

Cool, I estimate that I will use 1 to 2 weeks on playing around and by the end of this I will do a small tutorial / write up about it for others to reproduce.

When running with 20 CPUs, the output is the following:

epoch       iteration   lr          main/loss   main/loss/loc  main/loss/conf  validation/main/map
0           1           0.0001      12.6748     1.21717        11.4576
0           2           0.0001      12.1243     1.07264        11.0517
0           3           0.0001      11.0204     1.13749        9.88291
0           4           0.0001      11.2276     1.83673        9.39091
0           5           0.0001      10.5403     1.55973        8.98054
0           6           0.0001      10.2046     1.22746        8.97713
0           7           0.0001      9.44188     1.12225        8.31962
0           8           0.0001      9.34091     1.43901        7.9019
0           9           0.0001      8.64663     0.897129       7.7495
0           10          0.0001      8.61286     1.02696        7.5859
0           11          0.0001      8.40919     1.08716        7.32203
0           12          0.0001      8.05794     1.05238        7.00556
0           13          0.0001      7.55833     0.9543         6.60403
0           14          0.0001      7.73507     1.23755        6.49752
0           15          0.0001      7.43038     1.11817        6.31221
0           16          0.0001      7.78986     1.65317        6.13669
0           17          0.0001      6.92077     0.992295       5.92847
0           18          0.0001      6.85563     0.7304         6.12523
0           19          0.0001      6.82733     1.04058        5.78675
0           20          0.0001      6.07858     0.752546       5.32603
1           21          0.0001      6.3543      1.1031         5.2512
1           22          0.0001      6.02154     0.749306       5.27223
1           23          0.0001      6.05218     0.767107       5.28507
1           24          0.0001      6.41004     1.26777        5.14227
1           25          0.0001      5.5617      0.855585       4.70612
1           26          0.0001      5.52922     0.662999       4.86622
1           27          0.0001      6.09916     1.37244        4.72672
1           28          0.0001      5.50713     1.14605        4.36108
1           29          0.0001      5.18924     0.803506       4.38573
1           30          0.0001      4.88892     0.784509       4.10441
1           31          0.0001      4.79013     0.793774       3.99636
1           32          0.0001      5.08947     0.997038       4.09243
1           33          0.0001      4.88988     0.652114       4.23777
1           34          0.0001      5.36768     1.01028        4.3574
1           35          0.0001      4.78902     0.847532       3.94149
1           36          0.0001      4.42266     0.711169       3.71149
1           37          0.0001      4.63285     0.76253        3.87032
1           38          0.0001      4.87928     0.860469       4.01881
1           39          0.0001      4.65221     0.893757       3.75845
1           40          0.0001      5.23696     1.0522         4.18475
1           41          0.0001      4.69475     0.789808       3.90494
2           42          0.0001      4.77091     1.02642        3.74449
2           43          0.0001      4.79025     1.18098        3.60927
2           44          0.0001      3.99751     0.624271       3.37324
2           45          0.0001      4.3825      0.8421         3.5404
2           46          0.0001      4.15634     0.418492       3.73785
2           47          0.0001      4.30043     0.770104       3.53032
2           48          0.0001      4.29414     0.724917       3.56922
2           49          0.0001      4.43415     0.96855        3.4656
2           50          0.0001      4.40443     1.01633        3.3881
2           51          0.0001      3.97535     0.804308       3.17104
2           52          0.0001      4.09503     0.681377       3.41366
2           53          0.0001      4.09407     0.782554       3.31151
2           54          0.0001      4.2508      0.89875        3.35205
2           55          0.0001      3.9467      0.668529       3.27817
2           56          0.0001      4.00805     0.87974        3.12831
2           57          0.0001      4.66475     1.39154        3.27321

Do you have a comment on if you believe its better to run with less cores and get faster epocs or if the above can be accumulated to the same work as 60 epocs if only using one CPU?

I ask because its only two physical CPUS in the machine and using docker this gives 24 virtual cpus. So using 20 as in above setting, some resources are shared and i am wondering if its better to run with less cores. If so, is it possible to create a --argument for setting the number of processes in the multiprocessor setup?

pksorensen commented 7 years ago

Also, jsut saw this in my powershell script.

3           79          0.0001      3.53066     0.596175       2.93449
3           80          0.0001      3.5481      0.603821       2.94428
^[[B total [..................................................]  0.07%
this epoch [#########################################.........] 83.23%
        80 iter, 3 epoch / 120000 iterations
  0.016281 iters/sec. Estimated time to finish: 85 days, 5:57:27.408362.

Not sure how /when it was showing, but hehe 85days for my hardware setup :D I need to spin up a vm in the cloud instead :)

yuyu2172 commented 7 years ago

If so, is it possible to create a --argument for setting the number of processes in the multiprocessor setup?

You mean the number of processes to launch for MultiprocessIterator? Yeah, it can be configured. I just added an option --loaderjob INTEGER to do that.

I can not say much about the number of virtual cores. FYI, bulk of the time is spent calling NumPy matrix multiplication functions.

85 days are crazy number, but for small datasets like this, you do not need to train for 12000 iterations. Usually, a model converges much faster than that.

yuyu2172 commented 7 years ago

It is nice to hear that the OpenCV worked fine with multiprocessing. It did not work in one of my environments. It is frustrating, and I hope to find a solution to this problem. I just changed the code to stop using multiprocessing when --loaderjob is 0.

pksorensen commented 7 years ago

I just tried with --batchsize 1 and it takes 1/10 the time of when used 20 processes.

image

I will rebuild and try with loaderjob = 2 (the number of physical cores and see if I can get it to 1/20 of the time - being 4days on CPU.

And if you say 12000 instead of the default 120000, then its less than one day to train a simple model like this on standard hardware which is okay for starters.

Will be fun trying with GPU on azure soon also.

pksorensen commented 7 years ago

@yuyu2172 Thanks for all the help btw.

Also, do you know if its possible to specify where chainer downloads the precomputed model that it donwload in the beginning? (would like to tell it to store it in one of the mounted volumes in docker instead of inside the docker container)

HIN0209 commented 7 years ago

I just finished training and demo, which both worked fine on the apple-orange dataset. I will then move on to my own dataset to see how it goes.

The README.md for "Demo code to visualize output" has a typo: "python demo.py --pratrained_model result/model_iter_400" pretrained, instead of pratrained, just FYI.

yuyu2172 commented 7 years ago

Also, do you know if its possible to specify where chainer downloads the precomputed model that it donwload in the beginning? (would like to tell it to store it in one of the mounted volumes in docker instead of inside the docker container)

You can do this by the following command. export CHAINER_DATASET_ROOT=/mnt/

More description can be found here. https://docs.chainer.org/en/stable/reference/core/dataset.html#dataset-abstraction

The README.md for "Demo code to visualize output" has a typo: "python demo.py --pratrained_model result/model_iter_400" pretrained, instead of pratrained, just FYI.

Thank you!

yuyu2172 commented 7 years ago

And if you say 12000 instead of the default 120000, then its less than one day to train a simple model like this on standard hardware which is okay for starters.

Sorry. It was typo. I wanted to say that 120000 is overkill.

pksorensen commented 7 years ago

Ye i noticed it the second time i read it :)

I just tried the loaderjob and turns out it was the reduced batchsize that reduced the compute time: 100 93MiB 93MiB 9243KiB/s 0:00:00epoch iteration lr main/loss main/loss/loc main/loss/conf validation/main/map 0 1 0.0001 12.8458 1.24489 11.601 0 2 0.0001 11.4097 1.25192 10.1578 0 3 0.0001 11.5204 1.40347 10.117 0 4 0.0001 11.1528 0.975648 10.1772 0 5 0.0001 11.0459 1.58965 9.45627 0 6 0.0001 9.79693 1.28163 8.5153 0 7 0.0001 9.89153 1.13055 8.76098 0 8 0.0001 9.61068 1.18566 8.42502 0 9 0.0001 9.36866 1.50945 7.85922 0 10 0.0001 8.27591 0.609303 7.66661 0 11 0.0001 8.18028 0.790003 7.39028 0 12 0.0001 8.294 1.11646 7.17754 0 13 0.0001 7.69968 0.950753 6.74893 0 14 0.0001 7.9388 1.21336 6.72544 0 15 0.0001 7.03268 0.750632 6.28205 0 16 0.0001 7.68714 1.28199 6.40515 0 17 0.0001 6.90286 0.75934 6.14352 0 18 0.0001 6.93581 0.968773 5.96704 0 19 0.0001 6.5233 0.860986 5.66232 0 20 0.0001 6.81927 1.13721 5.68206 total [..................................................] 0.02% this epoch [###############################################...] 95.81% 20 iter, 0 epoch / 120000 iterations 0.015884 iters/sec. Estimated time to finish: 87 days, 10:13:51.144739.

I will let it run to 100 again to get validation/main/map for comparison.

Btw, the 3 columns main/loss main/loss/loc main/loss/conf validation/main/map - i was not able to google what these was exactly. I assume validation/main/map is the validation error on the test set?

yuyu2172 commented 7 years ago

They are values reported during training except for validation/main/map. https://github.com/yuyu2172/image-labelling-tool/blob/master/examples/ssd/train_utils.py#L38 They are loss values. You can get more details by reading the paper on SSD and ChainerCV code.

validation/main/map is mAP (mean average precision) value computed with validation dataset. It is computed inside DetectionVOCEvaluator. https://github.com/yuyu2172/image-labelling-tool/blob/master/examples/ssd/train_utils.py#L148

pksorensen commented 7 years ago

I have set up a machine with 4 GPUs, but if i read it correct here http://docs.chainer.org/en/stable/tutorial/gpu.html#data-parallel-computation-on-multiple-gpus-with-trainer

it is also needed to update the train code of your repro right? Want me to try myself or something you see an easy change/fix for?

yuyu2172 commented 7 years ago

I heard that simply using the default multi GPU Updater does not work. (I am sorry that I use terms in Chainer. They are well explained in the documentation. I also encourage you to read the source code of these abstractions. They are pretty straight forward.) This is because SSD uses hard negative mining.

I would recommend you to stick with single GPU for the time being. I think they are fast enough in most cases. We "might" add a reference implementation that trains with multi-GPUs.

pksorensen commented 7 years ago

Cool, thanks for the insight. And no worries aobut chainer terms. i am a quick study.

HIN0209 commented 7 years ago

Hello, I now moved to Image Labelling Tool https://github.com/yuyu2172/image-labelling-tool#example

I noted a few things to make clear. (1) flask_app.py did not work with python3.5. I had corrected syntax errors like print () and TypeError(), but it then showed the following error: Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so

Googling the error suggested numpy problems and they suggested either installing from conda or pip. After few installing/uninstalling, my conda environment finally crashed. I gave it up and instead worked with python2.7 and worked.

(2) Your README.md is vague regarding your suggested format. For example, do you recommend [--slic] option to be used or not? And instead of "For those who want to have label images", please specify what I should do for the code to work.

Help is appreciated.

pksorensen commented 7 years ago

A small update here also: I got it running on GPU

chainer@f583d4b31f8f:/src/image-labelling-tool/examples/ssd$ python train.py --train /data/apple_orange_annotations --label_names /data/apple_orange_annotations/apple_orange_label_names.yml --out /data/logs --gpu 0 --val_iteration 100 --log_iteration 10
epoch       iteration   lr          main/loss   main/loss/loc  main/loss/conf  validation/main/map
0           10          0.0001      10.7232     1.27872        9.44453
0           20          0.0001      7.54169     1.11881        6.42288
1           30          0.0001      5.72171     0.97805        4.74366
1           40          0.0001      4.698       0.795822       3.90218
2           50          0.0001      4.30619     0.780229       3.52596

this epoch [###################...............................] 39.52%
        50 iter, 2 epoch / 120000 iterations
    1.7213 iters/sec. Estimated time to finish: 19:21:24.683424.

How many itteration did you see before converging ? (we look at the validation/main/map to see this right?)

I am also working on a small webbased interface for labelling, my focus is around reducing the barriere for people and enabling less geeky people to be able to train a custom set. If you dont mind me asking, if you could with 3 clicks sign in and upload a folder of images and annotate them and download the annotation like this tool does, do you estimate this having value or does it not matter to set up python and run applications like this`?

HIN0209 commented 7 years ago

My training log is below. It accidentally stopped after 46000 iterations due to full SSD. The last main/loss was around 0.9 (apple-orange dataset) and seemed to converge at 30000 or so.

[
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 5,
        "elapsed_time": 79.22051453590393,
        "main/loss/loc": 0.7805992364883423,
        "iteration": 1000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 2.432619571685791,
        "main/loss": 3.213221549987793,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 11,
        "elapsed_time": 171.39589166641235,
        "main/loss/loc": 0.5678575038909912,
        "iteration": 2000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.6511310338974,
        "main/loss": 2.2189900875091553,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 17,
        "elapsed_time": 263.30492639541626,
        "main/loss/loc": 0.4849354326725006,
        "iteration": 3000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.4860771894454956,
        "main/loss": 1.971013069152832,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 23,
        "elapsed_time": 355.13462233543396,
        "main/loss/loc": 0.43977463245391846,
        "iteration": 4000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.3815630674362183,
        "main/loss": 1.8213376998901367,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 29,
        "elapsed_time": 445.9900951385498,
        "main/loss/loc": 0.38603052496910095,
        "iteration": 5000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.2903273105621338,
        "main/loss": 1.6763561964035034,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 35,
        "elapsed_time": 537.2248640060425,
        "main/loss/loc": 0.3692229092121124,
        "iteration": 6000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.2776328325271606,
        "main/loss": 1.6468547582626343,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 41,
        "elapsed_time": 629.2061340808868,
        "main/loss/loc": 0.33294421434402466,
        "iteration": 7000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.2454191446304321,
        "main/loss": 1.5783628225326538,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 47,
        "elapsed_time": 719.9931819438934,
        "main/loss/loc": 0.3216134309768677,
        "iteration": 8000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.1471179723739624,
        "main/loss": 1.4687303304672241,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 53,
        "elapsed_time": 810.4598653316498,
        "main/loss/loc": 0.2914847135543823,
        "iteration": 9000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.0824588537216187,
        "main/loss": 1.3739433288574219,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 59,
        "elapsed_time": 899.7752742767334,
        "main/loss/loc": 0.2933015525341034,
        "iteration": 10000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.0578984022140503,
        "main/loss": 1.3512004613876343,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 65,
        "elapsed_time": 989.7144713401794,
        "main/loss/loc": 0.2640993297100067,
        "iteration": 11000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.9952711462974548,
        "main/loss": 1.259369134902954,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 71,
        "elapsed_time": 1078.3362791538239,
        "main/loss/loc": 0.267562597990036,
        "iteration": 12000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.0545027256011963,
        "main/loss": 1.3220648765563965,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 77,
        "elapsed_time": 1167.5032584667206,
        "main/loss/loc": 0.2557935416698456,
        "iteration": 13000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.0422357320785522,
        "main/loss": 1.2980290651321411,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 83,
        "elapsed_time": 1258.6743831634521,
        "main/loss/loc": 0.24655602872371674,
        "iteration": 14000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 1.0473687648773193,
        "main/loss": 1.2939248085021973,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 89,
        "elapsed_time": 1349.2105464935303,
        "main/loss/loc": 0.22720617055892944,
        "iteration": 15000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.9975312352180481,
        "main/loss": 1.224737286567688,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 95,
        "elapsed_time": 1440.3668973445892,
        "main/loss/loc": 0.23588189482688904,
        "iteration": 16000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.9708943367004395,
        "main/loss": 1.2067770957946777,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 101,
        "elapsed_time": 1530.970549583435,
        "main/loss/loc": 0.2335476577281952,
        "iteration": 17000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.980478048324585,
        "main/loss": 1.2140252590179443,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 107,
        "elapsed_time": 1620.2500042915344,
        "main/loss/loc": 0.21971601247787476,
        "iteration": 18000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.9179012179374695,
        "main/loss": 1.1376179456710815,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 113,
        "elapsed_time": 1709.0796911716461,
        "main/loss/loc": 0.202173113822937,
        "iteration": 19000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.894392728805542,
        "main/loss": 1.0965656042099,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 119,
        "elapsed_time": 1797.5150842666626,
        "main/loss/loc": 0.20833085477352142,
        "iteration": 20000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.9276025891304016,
        "main/loss": 1.1359336376190186,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 125,
        "elapsed_time": 1886.2288718223572,
        "main/loss/loc": 0.18990658223628998,
        "iteration": 21000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8623952865600586,
        "main/loss": 1.0523018836975098,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 131,
        "elapsed_time": 1975.2177803516388,
        "main/loss/loc": 0.19189295172691345,
        "iteration": 22000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8291560411453247,
        "main/loss": 1.021048665046692,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 0.9930069930069931,
        "lr": 0.0001,
        "epoch": 137,
        "elapsed_time": 2064.356647014618,
        "main/loss/loc": 0.18369287252426147,
        "iteration": 23000,
        "validation/main/ap/Orange": 0.986013986013986,
        "main/loss/conf": 0.9099375009536743,
        "main/loss": 1.093629240989685,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 143,
        "elapsed_time": 2153.256175994873,
        "main/loss/loc": 0.19064386188983917,
        "iteration": 24000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8053968548774719,
        "main/loss": 0.9960411190986633,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 149,
        "elapsed_time": 2244.549509525299,
        "main/loss/loc": 0.1867237240076065,
        "iteration": 25000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8994454741477966,
        "main/loss": 1.0861691236495972,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 155,
        "elapsed_time": 2334.299106836319,
        "main/loss/loc": 0.18454843759536743,
        "iteration": 26000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8269904255867004,
        "main/loss": 1.0115389823913574,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 161,
        "elapsed_time": 2424.789150238037,
        "main/loss/loc": 0.1966702789068222,
        "iteration": 27000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8844095468521118,
        "main/loss": 1.0810796022415161,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 0.9509090909090911,
        "lr": 0.0001,
        "epoch": 167,
        "elapsed_time": 2513.2767944335938,
        "main/loss/loc": 0.16937251389026642,
        "iteration": 28000,
        "validation/main/ap/Orange": 0.901818181818182,
        "main/loss/conf": 0.745681881904602,
        "main/loss": 0.91505366563797,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 173,
        "elapsed_time": 2605.0023188591003,
        "main/loss/loc": 0.18461140990257263,
        "iteration": 29000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8689676523208618,
        "main/loss": 1.0535788536071777,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 179,
        "elapsed_time": 2696.3014690876007,
        "main/loss/loc": 0.18137913942337036,
        "iteration": 30000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8449581265449524,
        "main/loss": 1.0263373851776123,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 185,
        "elapsed_time": 2786.8104159832,
        "main/loss/loc": 0.17437873780727386,
        "iteration": 31000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.8081318736076355,
        "main/loss": 0.9825111627578735,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 191,
        "elapsed_time": 2875.750657081604,
        "main/loss/loc": 0.1655825674533844,
        "iteration": 32000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7402241230010986,
        "main/loss": 0.9058061838150024,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 197,
        "elapsed_time": 2965.155656814575,
        "main/loss/loc": 0.15511693060398102,
        "iteration": 33000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7123785018920898,
        "main/loss": 0.8674951195716858,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 203,
        "elapsed_time": 3054.330314874649,
        "main/loss/loc": 0.15841209888458252,
        "iteration": 34000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7834293246269226,
        "main/loss": 0.9418414235115051,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 209,
        "elapsed_time": 3143.364190340042,
        "main/loss/loc": 0.17163033783435822,
        "iteration": 35000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.787468433380127,
        "main/loss": 0.9590983986854553,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 215,
        "elapsed_time": 3232.4324741363525,
        "main/loss/loc": 0.15274818241596222,
        "iteration": 36000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7713798880577087,
        "main/loss": 0.9241276979446411,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 221,
        "elapsed_time": 3321.141427755356,
        "main/loss/loc": 0.14548836648464203,
        "iteration": 37000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7560657858848572,
        "main/loss": 0.9015540480613708,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 227,
        "elapsed_time": 3411.388104200363,
        "main/loss/loc": 0.1442233920097351,
        "iteration": 38000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7834077477455139,
        "main/loss": 0.9276309609413147,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 233,
        "elapsed_time": 3500.377106666565,
        "main/loss/loc": 0.15287399291992188,
        "iteration": 39000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7748618125915527,
        "main/loss": 0.9277357459068298,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 239,
        "elapsed_time": 3590.9456102848053,
        "main/loss/loc": 0.13595137000083923,
        "iteration": 40000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7328604459762573,
        "main/loss": 0.8688119649887085,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 245,
        "elapsed_time": 3680.542897462845,
        "main/loss/loc": 0.15140685439109802,
        "iteration": 41000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7466314435005188,
        "main/loss": 0.8980385065078735,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 251,
        "elapsed_time": 3769.9805710315704,
        "main/loss/loc": 0.15165714919567108,
        "iteration": 42000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7843142151832581,
        "main/loss": 0.9359715580940247,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 257,
        "elapsed_time": 3859.4798498153687,
        "main/loss/loc": 0.13653425872325897,
        "iteration": 43000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7511242032051086,
        "main/loss": 0.8876590132713318,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 263,
        "elapsed_time": 3949.208746910095,
        "main/loss/loc": 0.13751305639743805,
        "iteration": 44000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7098026275634766,
        "main/loss": 0.8473147749900818,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 269,
        "elapsed_time": 4038.1616518497467,
        "main/loss/loc": 0.1474171131849289,
        "iteration": 45000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7404646873474121,
        "main/loss": 0.8878818154335022,
        "validation/main/ap/Apple": 1.0000000000000002
    },
    {
        "validation/main/map": 1.0000000000000002,
        "lr": 0.0001,
        "epoch": 275,
        "elapsed_time": 4127.079681634903,
        "main/loss/loc": 0.13585035502910614,
        "iteration": 46000,
        "validation/main/ap/Orange": 1.0000000000000002,
        "main/loss/conf": 0.7742376327514648,
        "main/loss": 0.9100871086120605,
        "validation/main/ap/Apple": 1.0000000000000002
    }
]
pksorensen commented 7 years ago

Crazy that you get 11 iteration per sec :) on the best azure single GPU setup I got 1.17 iteration per sec.

What hardware is that ?

Azure is described here: https://azure.microsoft.com/en-us/blog/azure-n-series-general-availability-on-december-1/ 1 x K80 GPU, 6 Cores 56GB Ram.

Is the speed i saw expected or could there be something wrong?

HIN0209 commented 7 years ago

FYI, my additional hardware description: (11.2 iters/sec training time) training locally. Mother ASUS X99-E WS CPU: Core i7 6850K (3.60GHz) 6 cores/12T memory: 32GB (8GBx4) SSD: 500GB SSD S-ATA NVIDIA GTX 1080Ti (11GB) (I have 2 GPUs, but only 1 is used, I guess) Ubuntu14.04

yuyu2172 commented 7 years ago

Please make sure that

  1. cuDNN is installed chainer.cuda.cudnn_enabled == True.
  2. HIN0209 reported iter/s for batchsize==1. Is your batchsize 1?

Labelme would be the direct competitor if you are making an web based app for annotation.

HIN0209, thanks for your feedbacks. I will update the documentation.

pksorensen commented 7 years ago

no, mine was the default batchsize. I can do another experiment one of the following days when I am off work.

also chainer.cuda.cudnn_enabled was true.

I got to 9.5 itterations per sec with batchsize 1 :) From the theory i did in school i think the conclusion was that Stochastic Gradient Descent did not work properly with batchsize of 1 though.

but it dropped after running a bit image

what are your recommendation for batchsize?

HIN0209 commented 7 years ago

Another request is a python code to convert .xml to .json (used here) for the VOC dataset to be used here. I only found a script to do the other way around or just simply changing the format, but not exactly correcting the BBox location (from xmin, ymin, etc to the center/box size).

Thanks!

yuyu2172 commented 7 years ago

what are your recommendation for batchsize?

Usually, as big as you can make if your dataset is big. That way, GPU is fully utilized.

Another request is a python code to convert .xml to .json (used here) for the VOC dataset to be used here. I only found a script to do the other way around or just simply changing the format, but not exactly correcting the BBox location (from xmin, ymin, etc to the center/box size).

Thanks for your suggestion, but it seems that this is out of the scope for the project. I do not need to specialize the project to VOC dataset.

HIN0209 commented 7 years ago

Fine. I understand your limited time. Having said, LabelMe creates .xml, such that it does not help to this apple-orange project, either. There should be someone having similar requests, so I will look around other github projects.

pksorensen commented 7 years ago

@HIN0209 If you provides me the links to the VOC dataset labels i can properly put something together that converts it to the json format used here.

HIN0209 commented 7 years ago

Thanks. Here are the links to the VOC datasets (2007 and 2012), according to another excellent github site. https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

http://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar

I have tried to train my custom dataset, by modifying the codes provided by chainercv/examples/ssd and faster_rcnn. It turned out that code correction is needed to many .py files locating so deeply (even in anaconda3/env), and I almost gave up. This orange-apple project provides a much simpler code, hopefully I can train/test my own data with this. During the process of using chainercv/example codes, I have experienced "no response forever" as we discussed earlier in the respective training codes as well. Hope this is solved in more general codes.

pksorensen commented 7 years ago

I will take a look at http://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar and see how hard it is to convert it to same structure as this project. That is what you are asking right?

A folder with yml file containing the classes and then a folder per class. For each image in each folder there is a json file with the bounding boxes ?

I assume there can be images that have multiple classes, should those be represented in all class folders?

HIN0209 commented 7 years ago

converting from xml to json is found here. I can try this later. https://github.com/NervanaSystems/neon/tree/master/examples/faster-rcnn

yuyu2172 commented 7 years ago

I assume there can be images that have multiple classes, should those be represented in all class folders?

There can be multiple classes in one image. The directory name is arbitrary. The output only depends on what is written in .json.

source code https://github.com/yuyu2172/image-labelling-tool/blob/master/examples/ssd/original_detection_dataset.py

rida1896 commented 6 years ago

@HIN0209 Please can I have the weights trained for the model on orange-apple dataset?

HIN0209 commented 6 years ago

@rida1896 Thanks for your comment. Actually, my comment is now outdated. If you could do that on Mask-RCNN, it would be great.

rida1896 commented 6 years ago

@HIN0209 Do you have the dataset with annotations?