Closed duchengyao closed 5 years ago
Hi @duchengyao , I've added a section in the documentation for the multilabel classification. Let me know if it is still unclear.
I'm also facing the above issue, It's still unclear for me, Could you please provide more info?
I ended up using classifier instead of multilabel. If your annotations do not overlap, then use classification mode. It works for up to 7 labels (that number is hardwired into the model).
The way multilabel works is you have to deal with overlaps of annotations. Each overlap combination essentially makes a new kind of classification.
In my case, it is overlapping, for example, I need to annotate Table, Header and Sub headers. Headers and Sub headers are inside table, How can we annotate in this case? Should we use CLASSIFICATION or MULTITABLE ?
You definitely have multilabel then.
Thank you for your response.. I'm using the below 5 colors in the image annotation.
black- for backround, red- table , yellow- header, green - sub header, blue - title in the page.
I've updated my classes.txt file like below. is it correct?? , to use multilabel, should we change any method?
0 0 0 0 0 0 0 0 255 0 0 1 0 0 0 0 255 255 0 0 1 0 0 0 0 255 0 0 0 1 0 0 0 0 255 0 0 0 1 0
That does not look right. You should have some bits that correspond to overlap regions. I assume header and subheader should always overlap with table, so header and subheader colors need to have the bit mask with two bits set. In your examples, all your bit masks (referred in doc as attribution code) have only 1 bit set.
Another point: number of bits in the bit mask is the number of primary labels/classes table, header, subheader. That means you need 3 mask bits, not 5. The documentation implies it is not necessary to represent all possible bit mask combinations.
Assuming a that a header and subheader must always fully overlap table and these colors: table = red header = green subheader = blue
This would work:
0 0 0 0 0 0 # background 255 0 0 1 0 0 # table 0 255 0 1 1 0 # header 0 0 255 1 0 1 # subheader
Notice the only colors are R, G, and B. The colors are arbitrary and only need to be distinct for each combination you need.
Look at demo.py and the labels plane for output, that slice will have integers representing a predicted label for each pixel. Somehow that will decode into table/header/subheader. Perhaps it will use an int in the labels plane to represent the bit mask values, 4, 6, 5 for table, header, subheader.
Got it, thank you so much for detailed information, and I have 2 queries
I have used yellow for sub header, so can I use 255 255 0 1 1 0 instead of 0 255 0 1 1 0 ? or would you recommend me to use only Red, Green and Blue ?
I need to use an another one color also for title of the page, so what color would recommend me to use ? and how the mask bit will be?
It will be more helpful if you could help on this. Thank you.
For reference, I've attached the below sample for headers with sub-headers. ( Used Red color for table, Yellow color- header and Green - Sub-header) ...
Choice of color does not matter, you just need to make sure each bit mask combo has a unique color. For example, if we add title as a label, then this would work:
0 0 0 0 0 0 0 # background 255 0 0 1 0 0 0 # table 0 255 0 1 1 0 0 # header 0 0 255 1 0 1 0 # subheader 0 128 255 0 0 0 1 # title
Got it, thank you.. Should I do any modification in demo.py file? and also the above annotated image looks fine ?
demo.py will need plenty of modification for postprocessing. You can see what I did in:
https://github.com/tralfamadude/dhSegment/blob/master/ia_predict.py https://github.com/tralfamadude/dhSegment/blob/master/ia_postprocess.py
Look at what I did for debug mode; I saved probability maps, _rect.jpg has the predicted rectangles, and __boxes.jpg and those might be helpful for you to know what is going on. You could OCR the predicted rectangles, for instance.
In my case, the OCR is already done and put into hocr format, so I use the rectangle coord. to extract text from that.
Thank you so much @tralfamadude , I'll look into that.
@tralfamadude Hi, what is the ratio of images count we should maintain for train and evaluation ?
For example , if I have 300 images and labels, Can I keep 200 in train folder, 100 images for evaluation?
Using 80% train is normal. What really matters is performance on withheld examples (often called the test set).
test set means the remaining 20 % ?
Example: train: 160 eval: 40 test: 100 (withheld)
Terminology (eval vs. test) is not consistent in the field, so I use 'withheld' to specify the set that is not part of the training loop. The eval set is part of the training loop, even though it is not trained upon; by virtue of being the measure of training accuracy, it is possible to overfit on the combined eval+training sets. Using a test/withheld set you can check generalization.
The dhSegment demo.py can be used to run the withheld test set using the trained model. If you look at my fork of dhSegement, see ia_predict.py which shows more about post-processing.
Thank you for detailed info..
Got it, it is input images that we are giving for testing, right?
I used https://github.com/tralfamadude/dhSegment/blob/master/ia_predict.py in two phases: post-model training vs. production. Vision needs plenty of post-processing and for my case I need to extract text that is conditional upon 2 classifications being present on the same page/image. To do that, I used a post model decision tree in a stacked approach. In the post-model training, the training+eval sets are the X for type of page Y and that trains the decision tree. In production mode, the decision tree is used to direct post-processing (what actions to take for each page type).
input images that we are giving for testing
Yes, the U shaped NN dhSegment uses is focused on training a pixel to pixel mapping. Then post-processing is used to make something from that.
Got it... thank you so much..
What are the best training parameters I can use to improve the accuracy?? I tried n_epochs =30 and n_epochs =60, I'm not getting good accuracy for table headers...
Below is the config file, what other parameters I can change?
{ "training_params" : { "learning_rate": 5e-5, "batch_size": 1, "make_patches": false, "training_margin" : 0, "n_epochs": 30, "data_augmentation" : true, "data_augmentation_max_rotation" : 0.2, "data_augmentation_max_scaling" : 0.2, "data_augmentation_flip_lr": true, "data_augmentation_flip_ud": true, "data_augmentation_color": false, "evaluate_every_epoch" : 10 }, "pretrained_model_name" : "resnet50", "prediction_type": "MULTILABEL", "train_data" : "myfolder/train/", "eval_data" : "myfolder/val_a1", "classes_file" : "myfolder/train/classes.txt", "model_output_dir" : "page_model", "gpu" : "" }
I have tried varying the number of epochs, but the default has been best for me. In general, if your accuracy needs improving, then get more training data.
Oh OK, what about other parameters such as batch_size, data_augmentation_flip_lr, evaluate_every_epoch... and etc...? all can be set as default? and only I can focus on getting more training data? to improve the accuracy..
I find that more training data will give the best improvement, but you can try variations of batch size, etc. in a grid search for best parameters. Let me know what you find out.
Thank you so much for your replies @tralfamadude, It is very useful for me..
Sure, I'll try some variations and let you know.
@tralfamadude After increasing the count of images, very rarely model is getting trained successfully, most of the time, it is getting stopped with the below error.. It also takes more than 2 hours for training, then it gets stopped with the below error...
Any idea on this issue??
InvalidArgumentError (see above for traceback): Incompatible shapes: [1,4,776,747] vs. [1,4,776,746] [[node sigmoid_xentropy_loss/per_pixel_loss/mul (defined at /PDF_Backend/Dh_segment/dh_segment/estimator_fn.py:119) ]]
I have not seen that error. You should post the stracktrace as a new issue and tag SeguinBe who has been very helpful. It seems like an internal error since the system can handle mixed image sizes.
@tralfamadude Sure.. Thank you for your assistance.
After increasing the batch_size as 2 (from 1) the above issue is solved. @tralfamadude ..
The "Incompatible shapes" error went away when you increased the batch size?
Yes..
@tralfamadude It takes more than 5 hours for training 300 + images...How do we reduce the training time ??
when i change
prediction_type
fromCLASSIFICATION' to 'MULTILABEL
so how to use multi-label?
Thanks!