Dataset Generation - Githubissues

fanq15 / FewX

FewX is an open-source toolbox on top of Detectron2 for data-limited instance-level recognition tasks.

https://github.com/fanq15/FewX

MIT License

346 stars 48 forks source link

Closed XiongweiWu closed 4 years ago

XiongweiWu commented 4 years ago

Three questions:

In 1_split_filter.py#L46-L48, to my point, sampled image should not contain objects in voc classes. However, this implementation seems only the image with tiny objects will be excluded;
In 2_balance.py#L57, each category only contains no more than 80 instances?
How to generate final_split_voc_10_shot_instances_train2017.json ?

fanq15 commented 4 years ago

pick non-voc class
80 is the minimum instance number in each class
You can use the given final_split_voc_10_shot_instances_train2017.json in the new_annotations dir for a fair comparison.

XiongweiWu commented 4 years ago

@fanq15

So in your non-voc set, the images may also contain voc class instance (but not labeled) ?
It seems that you first compute the total number of instance per class across all images stored in 'all_cls_dict', and then for each image, if one contained instance category number is less than 80 in 'all_cls_dict', then save all instances in this image for training, otherwise discard all the instances and remove the instances whose number is larger than 80. I am a bit confused about this file.
Can u provide 30-shots json file?

fanq15 commented 4 years ago

Yes. The voc instances are ignored.
About the 2_balance.py: 2.1. Yes, it should be the instance number per class. I fixed the expression in the former answer. 2.2. There is a bug in the 2_balance.py and it actually does not balance the categories. But this bug does not affect the training and evaluation. I will fix this bug and see if the image balance can improve the performance.
There is no 30-shot json file currently. I will add it later.