Open DarkMythosIOTA opened 3 years ago
I'm very interested in hearing what @chenwydj has to say about this.
hey @Gaussianer did you manage to trans FasterSeg with customdataset following guidelines in #46 ?
Hey @emersonjr , yes we have provided a repo for this as well. Look here: https://github.com/Gaussianer/FasterSeg
However, we cannot yet provide any information in the repo about how far the code has to be adapted to the resolution. We have trained several models, but we wonder if the resolution needs to be adjusted to improve the results.
no worries @Gaussianer thanks for replying here :) I'm also a master student working with real time image segmentation, in my case its aimed for images containing sugar canes and weeds.
I got some questions you probably could help since you did custom training , i'm jus not sure here is the best place though but anyways...
I basically want to train FasterSeg with custom dataset as well, but my classes has nothing to do with any of the Cityscapes classes, my classes are: Sugar Cane and Weeds (should I count Background for the number of clasess as well?) I'm coding them on the ground truth images (annotations) as following: Sugar Canes pixels are [0,0,0], Weeds are [1,1,1] and everything else (Background) is [255,255,255], here's an example image (the image is 1024x2048 by mistake, i know i'll need to generate 2048x1024 instead):
What should I change in your repo code to train with dataset containing these images? Thanks beforehand mate!
On the one hand you have to create the dataset according to the description. For this you have to generate the provided labelDefinitions.csv according to the template. Here you can also see the corresponding attributes for the background (unlabeled). Just try to go through our description, maybe some parts are not documented yet, if you have problems, please contact me. Then I can also improve it, so that others can profit from it.
Thanks @Gaussianer
So, I've followed description and also created my own labelDefinitions.csv
. here it is:
name,id,trainId,category,catId,hasInstances,ignoreInEval,color_r,color_g,color_b
unlabeled,0,255,void,0,False,False,0,0,0
sugar cane,1,0,void,0,False,False,100,50,15
weeds,2,1,void,0,False,False,247,103,0
Created that way cause my background (unlabeled) pixels on _labelTrainIds.png
are [255,255,255]
, Sugar Canes are [0,0,0]
and Weeds are [1,1,1]
.
I also did edit config_search.py
and config_train.py
to set C.num_classes = 3
for my case. However, when I run CUDA_VISIBLE_DEVICES=0 python train_search.py
I do get the error shown below:
root@5be7442709af:/home/FasterSeg/search# CUDA_VISIBLE_DEVICES=0 python train_search.py
use TensorRT for latency test
use TensorRT for latency test
Experiment dir : search-pretrain-256x512_F12.L16_batch3-20211202-161628
02 16:16:28 args = {'seed': 12345, 'repo_name': 'FasterSeg', 'abs_dir': '/home/FasterSeg/search', 'this_dir': 'search', 'root_dir': '/home/FasterSeg', 'dataset_path': '/home/FasterSeg/dataset', 'img_root_folder': '/home/FasterSeg/dataset', 'gt_root_folder': '/home/FasterSeg/dataset', 'train_source': '/home/FasterSeg/dataset/train_mapping_list.txt', 'eval_source': '/home/FasterSeg/dataset/val_mapping_list.txt', 'num_classes': 3, 'background': -1, 'image_mean': array([0.485, 0.456, 0.406]), 'image_std': array([0.229, 0.224, 0.225]), 'down_sampling': 2, 'image_height': 256, 'image_width': 512, 'gt_down_sampling': 8, 'num_train_imgs': 50, 'num_eval_imgs': 25, 'bn_momentum': 0.1, 'bn_eps': 1e-05, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0005, 'num_workers': 4, 'train_scale_array': [0.75, 1, 1.25], 'eval_stride_rate': 0.8333333333333334, 'eval_scale_array': [1], 'eval_flip': False, 'eval_height': 1024, 'eval_width': 2048, 'grad_clip': 5, 'train_portion': 0.5, 'arch_learning_rate': 0.0003, 'arch_weight_decay': 0, 'layers': 16, 'branch': 2, 'pretrain': True, 'prun_modes': ['max', 'arch_ratio'], 'Fch': 12, 'width_mult_list': [0.3333333333333333, 0.5, 0.6666666666666666, 0.8333333333333334, 1.0], 'stem_head_width': [(1, 1), (0.6666666666666666, 0.6666666666666666)], 'FPS_min': [0, 155.0], 'FPS_max': [0, 175.0], 'batch_size': 3, 'niters_per_epoch': 400, 'latency_weight': [0, 0], 'nepochs': 20, 'save': 'search-pretrain-256x512_F12.L16_batch3-20211202-161628', 'unrolled': False}
02 16:16:36 params = 2.568351MB, FLOPs = 71.064453GB
architect initialized!
using downsampling: 2
Found 25 images
using downsampling: 2
Found 25 images
using downsampling: 2
Found 25 images
0%| | 0/20 [00:00<?, ?it/s]02 16:25:11 True
02 16:25:11 search-pretrain-256x512_F12.L16_batch3-20211202-161628
02 16:25:11 lr: 0.02
02 16:25:11 update arch: False
[Epoch 1/20][trTraceback (most recent call last): | 0/20 [00:00<?, ?it/s]
File "train_search.py", line 307, in <module>
main(pretrain=config.pretrain)
File "train_search.py", line 137, in main
train(pretrain, train_loader_model, train_loader_arch, model, architect, ohem_criterion, optimizer, lr_policy, logger, epoch, update_arch=update_arch)
File "train_search.py", line 246, in train
loss = model._loss(imgs, target, pretrain)
File "/home/FasterSeg/search/model_search.py", line 489, in _loss
logits = self(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/FasterSeg/search/model_search.py", line 287, in forward
out_prev = [[stem(input), None]] # stem: one cell
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/FasterSeg/search/operations.py", line 127, in forward
x = self.conv(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 83, in forward
exponential_average_factor, self.eps)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1697, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/pin_memory.py", line 21, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Btw the container i'm running stems from installation by Dockerfile process. If I follow the same steps and run the training command above in your provided image from Dockerhub it doesn't detect TensorRT is installed and I get this error:
root@be035b6f0647:/home/FasterSeg/search# CUDA_VISIBLE_DEVICES=0 python train_search.py
/home/FasterSeg/tools/utils/darts_utils.py:179: UserWarning: TensorRT (or pycuda) is not installed. compute_latency_ms_tensorrt() cannot be used.
warnings.warn("TensorRT (or pycuda) is not installed. compute_latency_ms_tensorrt() cannot be used.")
use PyTorch for latency test
use PyTorch for latency test
Experiment dir : search-pretrain-256x512_F12.L16_batch3-20211202-152200
02 15:22:00 args = {'seed': 12345, 'repo_name': 'FasterSeg', 'abs_dir': '/home/FasterSeg/search', 'this_dir': 'search', 'root_dir': '/home/FasterSeg', 'dataset_path': '/home/FasterSeg/dataset', 'img_root_folder': '/home/FasterSeg/dataset', 'gt_root_folder': '/home/FasterSeg/dataset', 'train_source': '/home/FasterSeg/dataset/train_mapping_list.txt', 'eval_source': '/home/FasterSeg/dataset/val_mapping_list.txt', 'num_classes': 3, 'background': -1, 'image_mean': array([0.485, 0.456, 0.406]), 'image_std': array([0.229, 0.224, 0.225]), 'down_sampling': 2, 'image_height': 256, 'image_width': 512, 'gt_down_sampling': 8, 'num_train_imgs': 0, 'num_eval_imgs': 0, 'bn_momentum': 0.1, 'bn_eps': 1e-05, 'lr': 0.02, 'momentum': 0.9, 'weight_decay': 0.0005, 'num_workers': 4, 'train_scale_array': [0.75, 1, 1.25], 'eval_stride_rate': 0.8333333333333334, 'eval_scale_array': [1], 'eval_flip': False, 'eval_height': 1024, 'eval_width': 2048, 'grad_clip': 5, 'train_portion': 0.5, 'arch_learning_rate': 0.0003, 'arch_weight_decay': 0, 'layers': 16, 'branch': 2, 'pretrain': True, 'prun_modes': ['max', 'arch_ratio'], 'Fch': 12, 'width_mult_list': [0.3333333333333333, 0.5, 0.6666666666666666, 0.8333333333333334, 1.0], 'stem_head_width': [(1, 1), (0.6666666666666666, 0.6666666666666666)], 'FPS_min': [0, 155.0], 'FPS_max': [0, 175.0], 'batch_size': 3, 'niters_per_epoch': 400, 'latency_weight': [0, 0], 'nepochs': 20, 'save': 'search-pretrain-256x512_F12.L16_batch3-20211202-152200', 'unrolled': False}
02 15:22:09 params = 2.568351MB, FLOPs = 71.064453GB
Traceback (most recent call last):
File "train_search.py", line 306, in <module>
main(pretrain=config.pretrain)
File "train_search.py", line 69, in main
model = model.cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 205, in _apply
self._buffers[key] = fn(buf)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 162, in _lazy_init
_check_driver()
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 82, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
So i'm following with the first container. Do you have any idea what's happening in this case?
@emersonjr Did you install the Docker NVIDIA container runtime as in the installation description?
Regarding TensorRT. Yes we had to remove TensorRT from the environment because it always led to errors during training.
@Gaussianer I noticed that I didn't by mistake. Installed now and retried training but it's still giving the same error. (yes, I did restart Docker service, rebooted, even ran a new container). Have any ideas?
@emersonjr Have you installed the appropriate graphics card driver as well as CUDA 10.1 and CUDNN? We have provided a guide for CentOS 7 for the setup with Podman.
Hello @chenwydj,
We have already asked how we train FasterSeg with Custom Data, see here. However, we would still have a question regarding the image resolution and the necessary adjustments in the code. We have found several places that match the image resolution or at least have a correlation with it. See here, here, here, here, here, here, here, here, here, here and here.
Do all these values need to be adjusted to the resolution of the data set?
Thank you for providing FasterSeg and the support from your side.