MaybeShewill-CV / bisenetv2-tensorflow

Unofficial tensorflow implementation of real-time scene image segmentation model "BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation"
https://maybeshewill-cv.github.io/bisenetv2-tensorflow/
MIT License
224 stars 59 forks source link

retrain issue with CityScapes #37

Closed suke27 closed 3 years ago

suke27 commented 3 years ago

Hi, Thanks for your good jobs, i have a question, when i try to train with CitySpacesReader but not use tfrecord, i found it will OOM. so i change patch size from 20181024 to 2014512, change OHEM to OHEM: ENABLE: True SCORE_THRESH: 0.65 MIN_SAMPLE_NUMS: 340788 it seems work fine,

train loss: 0.91353, miou: 0.65194: 100%|████████████████████████████████████████████████████████████████████| 185/185 [00:56<00:00, 3.30it/s] 2020-12-30 18:32:58.879 | INFO | trainner.cityscapes.cityscapes_bisenetv2_single_gpu_trainner:train:307 - => Epoch: 876 Time: 2020-12-30 18:32:58 Train loss: 0.91419 Train miou: 0.65191 ...

but when i do evaluation, the output is total different with your pretrained model. do you know how to resolve it?

MaybeShewill-CV commented 3 years ago

@suke27 The result can not be reproduced if you change the experiment configuration:)

suke27 commented 3 years ago

Hello, i have a another question, i test with your pretrained model( it should be train with 20481024), when i test my own image, i found the results were significantly different, when i use different resolution(like 512512, 1024512, 20481024). it seems sensitive to resolution and aspect ratio, could you explain it.

MaybeShewill-CV commented 3 years ago

@suke27 The model was trained on image with size 1024 * 512. Obviously you will get different result with input images of different size:)

suke27 commented 3 years ago

@MaybeShewill-CV As we know, the convolution is robust to pixel deformation, shift. that's why we can enhance train dataset(flip, resize, rotate). so i think it should not sensitive to resolution

MaybeShewill-CV commented 3 years ago

@suke27 You may read the origin paper to find the answer:)