Closed InternetMaster1 closed 2 years ago
@InternetMaster1
Is it possible to use HRNetV2-W18-SMALL-v2 in the HumanSeg branch? Yes, but you should write yourself train.py like HumanSeg one.
If you want to have a highly accurate human cutouts for full body, you should have a dataset about full body to training model.
HRNetV2-W18/bn is not the HRNetV2-W18-SMALL-v2
The figure is the paraeters of HRNet_w18_small_v1/v2 . You can set the parameters in models/HumanSeg/HumanSegMobile. @InternetMaster1
Dear @wuyefeilin
Thanks for the detailed replies.
1) Dataset :- Yes, I will make a dataset of full body images for training the model
2) HRNetV2-W18-SMALL-v2 To enable this, I have to change the following code in HumanSeg/models/humanseg.py
Current Code for : HRNetV1-W18-SMALL-v1
class HumanSegMobile(SegModel):
def __init__(self,
num_classes=2,
stage1_num_modules=1,
stage1_num_blocks=[1],
stage1_num_channels=[32],
stage2_num_modules=1,
stage2_num_blocks=[2, 2],
stage2_num_channels=[16, 32],
stage3_num_modules=1,
stage3_num_blocks=[2, 2, 2],
stage3_num_channels=[16, 32, 64],
stage4_num_modules=1,
stage4_num_blocks=[2, 2, 2, 2],
stage4_num_channels=[16, 32, 64, 128],
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255,
sync_bn=True):
For Changing to : HRNetV2-W18-SMALL-v2
class HumanSegMobile(SegModel):
def __init__(self,
num_classes=2,
stage1_num_modules=1,
stage1_num_blocks=[1],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[2, 2],
stage2_num_channels=[18, 36],
stage3_num_modules=1,
stage3_num_blocks=[2, 2, 2],
stage3_num_channels=[18, 36, 72],
stage4_num_modules=1,
stage4_num_blocks=[2, 2, 2, 2],
stage4_num_channels=[18, 36, 72, 144],
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255,
sync_bn=True):
Is the above correct? So we do not have to make any changes in the train.py or the config.yaml? The above should be enough?
3) Config Can you provide similar config for HRNetV2-W18-SMALL-v2 to achieve mIOU of 76.2? Thanks
4) OCR Is it possible to implement OCR on top of HRNetV2-W18-SMALL-v2 in this library? Please refer this issue for details https://github.com/HRNet/HRNet-Semantic-Segmentation/issues/141#issuecomment-631861819
@InternetMaster1
@wuyefeilin Many thanks!
@wuyefeilin
Is it possible to increase priority of OCR implemetation?
I tested on the official library of HRNet and noticed that OCR on top of HRNetV2-W18-SMALL-v2 increases the mIOU by a good margin.
It will improve the segmentation accuracy.
HumanSeg on PaddleSeg is already an amazing library, and addition of OCR will make it all the more amazing! :)
Dear @wuyefeilin ,
I wish to train the HRNet HumanSeg-mobile model on a custom dataset with just person class (with format exactly like Supervisely person dataset). How to achieve this?
I do not wish to perform transfer learning on the existing pre-trained model containing Supervisely training.
How to get a blank model file for HumanSeg-mobile?
Thank you for your suggestion. We will add OCR as soon as possible.
You can have the pretrain model from https://github.com/PaddlePaddle/PaddleSeg/blob/develop/docs/model_zoo.md
Dear @wuyefeilin,
1) Thanks for putting OCR in the ToDo list
2) Which file to take, I am confused.
Which base model file was used by your team for training HumanSeg-mobile with HRNet on Supervisely Dataset?
We train humanseg from scratch by hrnet. If you want to have a pretrained model to train hrnet, you can select relative model from the following.
@wuyefeilin
Dataset for HumanSeg On what dataset is the "humanseg_mobile_ckpt" trained on and for how many epochs?
Where is HrNetV2-W18-Small-v2 (or V1) in Model Zoo? I checked the model_zoo page. On that page, the hrnet_w18_imagenet.tar size is too large, so it doesn't look like HrNetV2-W18-Small-v2. And the others are already trained on Cityscape or Coco, which is not what we are looking for.
Model to Initialise Custom Training on "humanseg_mobile" Which model to use to initialize the training for "humanseg_mobile"? Can you provide me direct link? I am actually looking for a model file to initialise "humanseg_mobile" and then train on my custom dataset from scratch. How to achieve this? My custom dataset structure is similar to Supervisely, but I don't want a model file which is already pre-trained on Supervisely. I want to do fresh training from scratch for HrNetV2-W18-Small-v2
Input Size The "humanseg_mobile" model seems to be trained on input size of (192,192). I noticed that the official repo of HRNet, the small models (e.g. HRNetV2-W18-Small-v2) are trained and tested with the input size of 512x1024 and 1024x2048 respectively. Is it possible to train and test "humanseg_mobile" on bigger input size? Will the accuracy improve with training on increased size in case I am trying to achieve segmentation of full body, rather than just portrait?
Thanks for your patience!
@wuyefeilin
We really wish to use PaddleSeg/HumanSeg for our deployment and the library is amazing. If you could just help us out with the above questions, we would be most thankful..
@InternetMaster1
@wuyefeilin
Thanks for the detailed response. We will go ahead and use "HumanSeg-mobile" pre-trained model, it looks amazing.
In the near future, is it possible for you to upgrade HumanSeg-mobile to use "HrNetV2-W18-Small-v2" and then train this on your private dataset and provide a pretrained model for this? You could also add OCR to it to further improve accuracy
This would make "HumanSeg-mobile" the best option available out there for mobile phone segmentation and PaddleSeg the go-to SOTA library! :)
@wuyefeilin
We took your pretrained "humanseg_mobile_ckpt" and then did further training on it on the mini-supervisely dataset, but after our training, the results became worse. Why should this happen?
Check the below table which contains results for
Original Images
Evaluate on pretrained "humanseg_mobile_ckpt" This is based on hrnetv1 trained on your internal Baidu dataset
Evaluate on custom model obtained from transfer learning for 200 epoch on mini-supervisely (on top of your existing pretrained "humanseg_mobile_ckpt" as base)
[EVAL] Finished, Epoch=200, miou=0.857455. Model saved in output/epoch_200. Current evaluated best model in eval_dataset is epoch_95, miou=0.8625659576649392
Why results became worse with more training?
Original | humanseg_mobile (baidu dataset trained) | humanseg_mobile (baidu + supervisely) |
---|---|---|
Any tips on how to further improve the results over the existing humanseg_mobile?
@wuyefeilin
We took your pretrained "humanseg_mobile_ckpt" and then did further training on it on the mini-supervisely dataset, but after our training, the results became worse. Why should this happen?
Check the below table which contains results for
- Original Images
- Evaluate on pretrained "humanseg_mobile_ckpt" This is based on hrnetv1 trained on your internal Baidu dataset
- Evaluate on custom model obtained from transfer learning for 200 epoch on mini-supervisely (on top of your existing pretrained "humanseg_mobile_ckpt" as base)
[EVAL] Finished, Epoch=200, miou=0.857455. Model saved in output/epoch_200. Current evaluated best model in eval_dataset is epoch_95, miou=0.8625659576649392
Why results became worse with more training?
Original humanseg_mobile (baidu dataset trained) humanseg_mobile (baidu + supervisely)
We just choose some samples form supervisely randomly. The mini_supervisely dataset is small, it used as a demo dataset for running the repo. you finetuning it on the dataset, the result becoming worse is normal. May be you should finetuning the model on the whole supervisely dataset.
Any tips on how to further improve the results over the existing humanseg_mobile?
- I tried training with more images on your pre-trained model, but it is spoiling things further. What to do next?
- Does PaddlePaddle support Image Matting techniques?
You can change the --image_shape patameters. If you want to have a better result , you can try the HumanSeg-Server, or use HRNet_w48.
PaddleSeg can not support Image Matting techniques
@wuyefeilin
I will train on full Supervisely and report back on the results.
I tried to do finetune the model with mini supervisely based on three input sizes. On 192, 512, 1024. But we noticed that on larger sizes the results were becoming much worse. 1024 was giving the worst results. If I take your model which is trained on 192, and then I finetune on a bigger size, is it advisable? Or it would be a bad practice?
How to run humanseg_mobile on Android? I am stuck with some error, I tried lots of things last two days but not able to solve it. I posted another issue for this https://github.com/PaddlePaddle/PaddleSeg/issues/279 I found someone else facing similar issue and he too has not been able to solve it https://github.com/PaddlePaddle/Paddle-Lite/issues/3697
Finetuning on a bigger size is not problem . I suspect the problem lies in the dataset,our model is adapt better to The phone's front camera and vertical screen.
@wuyefeilin Are you saying that the dataset should have more vertical images rather than horizontal?
Can you give some details about the private Baidu dataset.
This will help us make a similar custom dataset on which we will finetune your model further
@wuyefeilin Can you also help me with Android deployment. I am stuck on that PaddleLite error.
Thank you so much for your patience and all of your help
I mean that there are more vertical images than horizontal on our dataset. There are about 80000 images. I trained 100 epochs and learning_late 0.1, and you can also set a larger num_epochs, such as 200. The image size is between about 100 and 3800, and high is between about [100, 4600]
@wuyefeilin
Wow that is amazing information. This will definitely help during our custom dataset creation for further fine-tuning your model.
I had a couple more question, if you are comfortable answering them.
1) The 80,000 images are the core images, or is this count achieved after data augmentation (rotation, etc). 2) The 80,000 images are unique images, or are created using foregrounds and backgrounds and using data composition? (Like they do for matting datasets, compositing 1000 foregrounds with 80 backgrounds would make 80,000 images). Is this how it is achieved? Or the dataset is actually this big?
Thanks :)
Sorry, I don not know exactly about the dataset maked. But the dataset is actually this big. I have not see the sign of data composition.
Many thanks @wuyefeilin
@wuyefeilin
I am still struggling with deploying humanseg_mobile on an Android device. I had imagined that it would be simple to make it run on Mobile device (since that is the important part of the library) but I am genuinely struggling. I have tried every step mentioned in the docs.
@Channingss has been most helpful and given some tips but it is still not working https://github.com/PaddlePaddle/PaddleSeg/issues/279#issuecomment-637364074
Others are also facing similar issue https://github.com/PaddlePaddle/Paddle-Lite-Demo/issues/66 https://github.com/PaddlePaddle/Paddle-Lite/issues/3697
What can I do now? Can you point me in the right direction?
@wuyefeilin
We are trying to finetune your model by doing additional training on your existing humanseg_mobile_checkpoint model.
We downloaded Supervisely Person dataset as well as AISegment dataset but it is giving following error at the time of evaluation.
Error when training on supervisely
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc return bound(*args, **kwds) ValueError: axes don't match array
This is sample mask from Supervisely Dataset
Error when training on aisegment
File "/usr/local/lib/python3.6/dist-packages/scipy/sparse/coo.py", line 285, in _check raise ValueError('row index exceeds matrix dimensions') ValueError: row index exceeds matrix dimensions
This is sample mask from AiSegment Dataset
Note:
The label value is not correct. In labeled images, 0 represent background while 1 represent the human. In aiseg dataset, the human mask is not 1. You can change it.
@wuyefeilin
@InternetMaster1 The label should be one channel image. In humanseg, 0 represent background while 1 represent the human.
@wuyefeilin
I have to make changes in the actual images of aiseg and supervisely, and not in code of paddle, correct?
Yes, you should changes the label of dataset according https://github.com/PaddlePaddle/PaddleSeg/blob/release/v0.5.0/docs/data_prepare.md
@wuyefeilin
Thanks. I was able to finetune on my custom dataset after converting the images to 0 and 1.
I had one more question.
Is it possible to perform Data Augmentation in humanseg library? How to achieve this?
def train(args):
train_transforms = transforms.Compose([
transforms.Resize(args.image_shape),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
!python train.py --model_type HumanSegMobile \
--save_dir output/ \
--data_dir data/person \
--train_list data/person/aiseg_train.txt \
--val_list data/person/aiseg_val.txt \
--pretrained_weights pretrained_weights/humanseg_mobile_ckpt \
--batch_size 6 \
--learning_rate 0.1 \
--num_epochs 1000 \
--image_shape 192 192 \
--use_vdl \
--resume_weights output/epoch_500/ \
--AUG_METHOD='unpadding' \
--MIRROR=True
Error: train.py: error: unrecognized arguments: --AUG_METHOD=unpadding --MIRROR=True
Yes, you can change the train_transform in train.py if you want to perform other data augmentations。 You can add the data augmentations in tranforms.py.
It can not add data augmentation by HumanSeg train command.
Many thanks @wuyefeilin, I will try this out.
Dear @wuyefeilin ,
I tested by changing the code in train.py as per below and it is working smoothly
Current Code
train_transforms = transforms.Compose([
transforms.Resize(args.image_shape),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
New Code
train_transforms = transforms.Compose([
transforms.RandomRotation(),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ResizeStepScaling(),
transforms.RandomBlur(),
transforms.RandomDistort(),
transforms.Resize(args.image_shape),
transforms.Normalize()
])
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
I had a few questions :-
1) Are my selection of transformations a good choice for the task of human segmentation?
2) I wish to fine-tune by training on your existing humanseg_mobile, and I hand-picked the above transformations from transforms.py file. Your model is trained with less transformations, and I would be fine-tuning with more transformations. Would that be fine?
3) For each image in my dataset, it would create just one more transformed image which would contain a combination of all of the above transformations, correct? Or it would create a separate image internally for each transformation during training?
4) I am not doing any transformations in eval_transforms. Is that fine? Or would the quality of training improve if we added a few more transformations in eval_transforms too?
Generally, only one of ResizeStepScaling and Resize will be chosen in train_transforms.
If ResizeStepScaling
is chosen, the eval_transform
should be as shown below.
eval_transforms = transforms.Compose(
[transforms.Padding(args.image_shape),
transforms.Normalize()])
The args.image_shape[0]
should equal the largest width of images in dataset, and the args.image_shape[1]
should be the largest height.
If Resize
is chosen, the eval_transform should be as shown below.
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape),
transforms.Normalize()])
It may be fine. You can have a try
The former
It is not need to add more transfroms to eval_transforms. Just set the eval_transforms as answer 1.
Many thanks @wuyefeilin
Much appreciated!
@wuyefeilin
In a custom dataset to finetune on your model, what should be percentage of Train and Val folders?
Was humanseg trained on 80 and 20 percent ratio?
It's almost 80% to 20%
@wuyefeilin
Ok, we also took 80% 20%
I had a question regarding inference :-
Currently if I run the infer.py script to get segmentation of one image, it loads the model, CUDA, etc and then gives the results. Then if a user requests for another image, then I again run infer.py for the next image. And again it loads model, CUDA, etc. This takes up too much time.
I want to load the Model and CUDA and the rest of infer.py scripts once for first time of execution into Memory and for the next time executions I don't want to load them again and again and just pass the arguments for images and want to use them directly from Memory.
Is it possible to achieve this?
you can change the predict function in infer.py according you requirment.
Ok thanks, let me try it out
Dear @wuyefeilin
I had a training related question.
I am looking to finetune the "humanseg_mobile_ckpt" model and improve upon it by further training on a large custom dataset.
python train.py --model_type HumanSegMobile --save_dir output-final/ --data_dir data/dataset/ --train_list data/dataset/train.txt --val_list data/dataset/val.txt --pretrained_weights pretrained_weights/humanseg_mobile_ckpt --batch_size 120 --learning_rate 0.1 --num_epochs 100 --image_shape 192 192 --use_vdl
epoch 5 - miou: 0.8717953388908974 epoch 10 - miou: 0.8800683647731691 epoch 15 - miou: 0.8891557006975803 epoch 20 - miou: 0.8831192153305805 epoch 25 - miou: 0.8933545768508766 epoch 30 - miou: 0.8817694534661873 epoch 35 - miou: 0.8835890666639037 epoch 40 - miou: 0.8999487227771421 epoch 45 - miou: 0.8970806366166504 epoch 50 - miou: 0.8866842320198985 epoch 55 - miou: 0.895099087548593 epoch 60 - miou: 0.8963781789732772 epoch 65 - miou: 0.8965480643359549 epoch 70 - miou: 0.9108704111252974 epoch 75 - miou: 0.9067427412035519 epoch 80 - miou: 0.9147721964189288 epoch 85 - miou: 0.9184056356530017 epoch 90 - miou: 0.9171996050468968 epoch 95 - miou: 0.924967335141359
COMMAND TO RESUME FROM EPOCH 95 AND DO UPTO 200 EPOCH (with pretrained weights as humanseg_mobile_ckpt)
python train.py --model_type HumanSegMobile --save_dir output-final/ --data_dir data/dataset/ --train_list data/dataset/train.txt --val_list data/dataset/val.txt --pretrained_weights pretrained_weights/humanseg_mobile_ckpt --batch_size 120 --learning_rate 0.1 --num_epochs 200 --image_shape 192 192 --use_vdl --resume_weights output-final/epoch_95
epoch 100 miou: 0.8991859417820522 epoch 105 miou: 0.8999288103014873
COMMAND TO RESUME FROM EPOCH 95 AND DO UPTO 500 EPOCH (with pretrained weights as output-final/epoch_95)
python train.py --model_type HumanSegMobile --save_dir output-final/ --data_dir data/dataset/ --train_list data/dataset/train.txt --val_list data/dataset/val.txt --pretrained_weights output-final/epoch_95 --batch_size 300 --learning_rate 0.1 --num_epochs 500 --image_shape 192 192 --use_vdl --resume_weights output-final/epoch_95
After starting this, the epoch values were better (after using --pretrained_weights output-final/epoch_95 as compared to pretrained_weights/humanseg_mobile_ckpt for resuming)
epoch 100 - miou: 0.9051754238899534 epoch 105 - miou: 0.8993193305352802 epoch 110 - miou: 0.9119137824992598 epoch 115 - miou: 0.9044368021564695 epoch 120 miou: 0.9145211914661961 epoch 125 miou: 0.9129612022468854
Did I do the right thing?
1) Should I have set 200 or 300 epoch right from beginning and stopped midway if I felt that there is some further training improvement?
2) Did I needed to use the --pretrained_weights as "output-final/epoch_95" instead of "pretrained_weights/humanseg_mobile_ckpt" for continuing the training?
3) How can I reach or cross your original mIOU of 0.9426070942252652?
Am I on the right path?
The Epochs should be set from begining. If you resume training, the epochs should not be changed.
Yes, you can use the "output-final/epoch_95" as pretrained model, and train from begining.
Dataset is different. There is no comparison about mIoU. You can change the super params to train a better model
Dear @wuyefeilin ,
Thanks for the clarity. Ok, I will retrain from scratch with higher epoch from the beginning itself.
1) Can you shed some light about "super params"? What do you mean by that, in which file to set these values, and on what basis should they be set?
2) On your Baidu dataset, during training, the best mIOU was for which epoch?
Thanks
You can finetune the learning rate, epochs, optimizer, train_transforms in train.py. You shuold do experiments to find the suitable values.
The best mIOU was from last epoch, But it also may from middle epoch. In out program, we save the model with best mIoU in 'best_model' directory.
I noticed that HumanSeg uses HRNetV2-W18-SMALL-v1.
Is it possible to use HRNetV2-W18-SMALL-v2 in the HumanSeg branch? @LutaoChu
I am looking to implement lightweight semantic segmentation on a mobile device for highly accurate human cutouts (full body, not just portrait). I am looking for segmentation of images, rather than video. What would you recommend?
I noticed that the Model Zoo for PaddleSeg mentions HRNet_W18 / bn.
Is this HRNetV2-W18-SMALL-v2?
In another Issue https://github.com/PaddlePaddle/PaddleSeg/issues/245, there is discussion on HRNet_W18 but I am unable to get clarity about the exact version.